Jimexist commented on pull request #403: URL: https://github.com/apache/arrow-datafusion/pull/403#issuecomment-850050234
> > Thank you @Jimexist -- the code structure looks clean to me . Nice work > > > > > > > > However, I am concerned about the correctness of these results. As I understand it, `first_value`, `last_value` and `nth_value` are not well defined unless there is an ordering on the window (aka without an ordering you basically can get some arbitrary value from the window). > > > > > > > > I wonder if it would make sense to implement ordering for windows first, so we can write tests will well defined output > > > > > > > > I also see some change to the `parquet-testing` module which I wonder if that was intended > > I guess it's not arbitrary but rather just take the ordering as is. When https://github.com/apache/arrow-datafusion/pull/425 is merged I'll add one test case to compare with psql so the behavior is consistent. > > Of course when ordering clause is implemented then the behavior can also be tested in the same way, along with some unit test. > > I plan to implement ordering after this and the lead/lag PR are merged because it requires some structural changes to the planner. See also https://github.com/apache/arrow-datafusion/pull/429 Both can be independently merged first because with the sort clause implemented these two logic shall stay unchanged as sorting happens as a separate physical plan that precedes these and feeds immediately to these. Regarding the submodule change it's not intended - will revert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
