mingmwang commented on PR #5290: URL: https://github.com/apache/arrow-datafusion/pull/5290#issuecomment-1438600953
> @mingmwang I went through this PR in detail. Thanks for the great work. Let me summarize my observations. > > The Top-Down approach is better at pushing down `SortExec` s if it is helpful. However, the Bottom-up approach is better at producing pipeline-friendly plans when it is possible. > > Since some tests improve and some regress with this change. I have constructed a unified test bench for these rules (I use union of the tests for `SortEnforcement` and `TopDownSortEnforcement`). The test bench can be found in the [PR](https://github.com/synnada-ai/arrow-datafusion/pull/51). PR body also includes the performance comparison of both rules on the unified test bench. I think we can reach to a rule where all tests in the test bench pass. I will try to do so myself. I encourage you to try it also. > > In the meantime, This PR has nice changes that do not need to wait for final rule version. Specifically, printing global flag in `SortExec` and api change from `fn required_input_ordering(&self) -> Vec<Option<&[PhysicalSortExpr]>>` to `fn required_input_ordering(&self) -> Vec<Option<Vec<PhysicalSortRequirements>>>` don't need to wait. I think you can file another PR with these changes. Thanks a lot, I will take a closer look at your comparing result tomorrow. I guess most of the failure case in Top-Down rule is because the current implementation try to keep the final output ordering and will not remove all the SortExec very aggressively. I will also resolve all the review comments tomorrow. Regarding split the PR, I do want to split it into two, because in this PR actually besides the new rule implementation and the added new tests, other changes are very tiny. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org