adriangb commented on code in PR #15770:
URL: https://github.com/apache/datafusion/pull/15770#discussion_r2145894014
##########
datafusion/physical-optimizer/src/enforce_sorting/sort_pushdown.rs:
##########
@@ -70,6 +71,8 @@ pub fn assign_initial_requirements(sort_push_down: &mut
SortPushDown) {
// If the parent has a fetch value, assign it to the children
// Or use the fetch value of the child.
fetch: child.plan.fetch(),
+ // If the parent has a filter, assign it to the children
Review Comment:
That's where the actual pushdown is happening! This part is just so that
that pushdown is not lost.
Basically: EnforceSorting wants to run last / late:
https://github.com/apache/datafusion/blob/61a2dfdc71508bd0b09abaefdc882d33a9e0d91e/datafusion/physical-optimizer/src/optimizer.rs#L111
Since `FilterPushdown` can absolutely modify the plan tree (e.g. by removing
a `FilterExec`) it needs to run _before_ this (maybe we just shouldn't remove
FilterExec and just make it a no-op, but that might loose optimizations e.g.
because `CoalesceBatchesExec` can't then be removed).
But at the same time `EnforceSorting` removes and re-adds `SortExec`'s
multiple times.
So what we are doing here is passing around the filters that were already
pushed down so that the connection between the SortExec and the DataSourceExec
is not lost. This is similar to what is happening with `limit` which is also
being passed around.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]