adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2145894014
########## datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs: ########## @@ -70,6 +71,8 @@ pub fn assign_initial_requirements(sort_push_down: &mut SortPushDown) { // If the parent has a fetch value, assign it to the children // Or use the fetch value of the child. fetch: child.plan.fetch(), + // If the parent has a filter, assign it to the children Review Comment: That's where the actual pushdown is happening! This part is just so that that pushdown is not lost. Basically: EnforceSorting wants to run last / late: https://github.com/apache/datafusion/blob/61a2dfdc71508bd0b09abaefdc882d33a9e0d91e/datafusion/physical-optimizer/src/optimizer.rs#L111 Since `FilterPushdown` can absolutely modify the plan tree (e.g. by removing a `FilterExec`) it needs to run _before_ this (maybe we just shouldn't remove FilterExec and just make it a no-op, but that might loose optimizations e.g. because `CoalesceBatchesExec` can't then be removed). But at the same time `EnforceSorting` removes and re-adds `SortExec`'s multiple times. So what we are doing here is passing around the filters that were already pushed down so that the connection between the SortExec and the DataSourceExec is not lost. This is similar to what is happening with `limit` which is also being passed around. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org