berkaysynnada commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2792203825
> The TLDR is that because each filter may be allowed or not, and maybe transformed, to be pushed down into each child we end up with a matrix of filters x children that we need to ask the node for. For example, imagine joins, each of its children needs to be treated differently and only some filters can be pushed into some children. That meant that as you suggest it had to be at least 2 methods on ExecutionPlan (2 with complex APIs or 3 simpler ones). You can properly (and IMO more easily) apply the pushdown logic in the rule recursion as well. There are many rules doing that. > The recursion is also a bit wonky because you need to: > > 1. Recurse down and on your way up transmit not only the new nodes but also the filter support result. That's exactly the intended usage of PlanContext > 2. There's jumps, eg when you hit a node that doesn't support pushdown but still want to recurse into sub-trees. That's not a problem as well in transform_down() when you say Recursion::Continue and Transformed::No > 3. You're carrying around non-trivial context as you go down. That's now a problem as well in PlanContext. > Having implemented it both ways I've come to the conclusion that trying to generalize the recursion into the optimizer rules makes things harder for both the simple cases (FilterExec, RepartitionExec) and the complex cases (HashJoinExec, ProjectionExec). Doing it this way makes the simple implementations trivial and the complex ones will be complex but will be self contained and not have to fit into some rigid APIs. You already keep the operator specific parts in the ExecutionPlan API. For example if I give an example on the simplest case: ```rust pub fn try_pushdown_filters_to_input( plan: &Arc<dyn ExecutionPlan>, input: &Arc<dyn ExecutionPlan>, parent_filters: &[PhysicalExprRef], config: &ConfigOptions, ) -> Result<FilterPushdownResult<Arc<dyn ExecutionPlan>>> { match input.try_pushdown_filters(input, parent_filters, config)? { FilterPushdownResult::NotPushed => { // No pushdown possible, keep this child as is Ok(FilterPushdownResult::NotPushed) } FilterPushdownResult::Pushed { updated: inner, support, } => { // We have a child that has pushed down some filters let new_inner = with_new_children_if_necessary(Arc::clone(plan), vec![inner])?; Ok(FilterPushdownResult::Pushed { updated: new_inner, support, }) } } } ``` Here, you will not need to do the recursive call, and the last part of with_new_children...() logic. As you've mentioned, maybe a single API wouldn't be enough, but perhaps having 2 or 3 is what we really need to comply separation of concerns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org