mingmwang commented on PR #4439: URL: https://github.com/apache/arrow-datafusion/pull/4439#issuecomment-1333451112
> @mingmwang, even though @mustafasrepo's example shows how the issue can manifest in a certain context, the core issue is that `self.input.output_ordering()` can return a result that is inconsistent with `self. required_input_ordering()`. In his specific case, the former returns `None`, which is clearly wrong -- the requirement would have been violated had this been the case. > > This seems to happen because as of PR #4122, we insert operators like`SortExec` _after_ physical planning (i.e. during the `BasicEnforcement` optimization step). Therefore, calls like `input_exec.output_ordering()` return such inconsistent/unreliable results during physical planning, since the "real" `input_exec` is not there yet! > > Until we figure out a general fix to this, we can select the "finer" ordering between `required_input_ordering()` and `input.output_ordering()` -- this will result in a less wrong result in the meantime. There is a general fix. The root cause is not because `output ordering` return None, but when the `WindowAggExec` try to derive the ordering from input, the column index must be handle properly. The same bug exists within output partition. I will raise a PR to fix the bug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org