Jimexist commented on pull request #403:
URL: https://github.com/apache/arrow-datafusion/pull/403#issuecomment-850050234


   > > Thank you @Jimexist  -- the  code structure looks clean to me . Nice work
   > > 
   > > 
   > > 
   > > However, I am concerned about the correctness of these results. As I 
understand it, `first_value`, `last_value` and `nth_value` are not well defined 
unless there is an ordering on the window (aka without an ordering you 
basically can get some arbitrary value from the window). 
   > > 
   > > 
   > > 
   > > I wonder if it would make sense to implement ordering for windows first, 
so we can write tests will well defined output
   > > 
   > > 
   > > 
   > > I also see some change to the `parquet-testing` module which I wonder if 
that was intended
   > 
   > I guess it's not arbitrary but rather just take the ordering as is. When 
https://github.com/apache/arrow-datafusion/pull/425 is merged I'll add one test 
case to compare with psql so the behavior is consistent.
   > 
   > Of course when ordering clause is implemented then the behavior can also 
be tested in the same way, along with some unit test.
   > 
   > I plan to implement ordering after this and the lead/lag PR are merged 
because it requires some structural changes to the planner.
   
   See also https://github.com/apache/arrow-datafusion/pull/429
   
   Both can be independently merged first because with the sort clause 
implemented these two logic shall stay unchanged as sorting happens as a 
separate physical plan that precedes these and feeds immediately to these.
   
   Regarding the submodule change it's not intended - will revert.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to