mingmwang commented on PR #4439:
URL: 
https://github.com/apache/arrow-datafusion/pull/4439#issuecomment-1333451112

   > @mingmwang, even though @mustafasrepo's example shows how the issue can 
manifest in a certain context, the core issue is that 
`self.input.output_ordering()` can return a result that is inconsistent with 
`self. required_input_ordering()`. In his specific case, the former returns 
`None`, which is clearly wrong -- the requirement would have been violated had 
this been the case.
   > 
   > This seems to happen because as of PR #4122, we insert operators 
like`SortExec` _after_ physical planning (i.e. during the `BasicEnforcement` 
optimization step). Therefore, calls like `input_exec.output_ordering()` return 
such inconsistent/unreliable results during physical planning, since the "real" 
`input_exec` is not there yet!
   > 
   > Until we figure out a general fix to this, we can select the "finer" 
ordering between `required_input_ordering()` and `input.output_ordering()` -- 
this will result in a less wrong result in the meantime.
   
   There is a general fix.  The root cause is not because `output ordering` 
return None, but when the `WindowAggExec` try to derive the ordering from 
input, the column index must be handle properly. The same bug exists within 
output partition.
   
   
   
   I will raise a PR to fix the bug. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to