Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

via GitHub Thu, 10 Apr 2025 10:04:32 -0700


berkaysynnada commented on PR #15566:
URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2792203825


    > The TLDR is that because each filter may be allowed or not, and maybe 
transformed, to be pushed down into each child we end up with a matrix of 
filters x children that we need to ask the node for. For example, imagine 
joins, each of its children needs to be treated differently and only some 
filters can be pushed into some children. That meant that as you suggest it had 
to be at least 2 methods on ExecutionPlan (2 with complex APIs or 3 simpler 
ones). 
   
   You can properly (and IMO more easily) apply the pushdown logic in the rule 
recursion as well. There are many rules doing that.
   
   > The recursion is also a bit wonky because you need to:
   > 
   > 1. Recurse down and on your way up transmit not only the new nodes but 
also the filter support result.
   
   That's exactly the intended usage of PlanContext
   
   > 2. There's jumps, eg when you hit a node that doesn't support pushdown but 
still want to recurse into sub-trees.
   
   That's not a problem as well in transform_down() when you say 
Recursion::Continue and Transformed::No
   
   > 3. You're carrying around non-trivial context as you go down.
   
   That's now a problem as well in PlanContext.
   
   > Having implemented it both ways I've come to the conclusion that trying to 
generalize the recursion into the optimizer rules makes things harder for both 
the simple cases (FilterExec, RepartitionExec) and the complex cases 
(HashJoinExec, ProjectionExec). Doing it this way makes the simple 
implementations trivial and the complex ones will be complex but will be self 
contained and not have to fit into some rigid APIs.
   
   
   You already keep the operator specific parts in the ExecutionPlan API.  For 
example if I give an example on the simplest case:
   ```rust
   pub fn try_pushdown_filters_to_input(
       plan: &Arc<dyn ExecutionPlan>,
       input: &Arc<dyn ExecutionPlan>,
       parent_filters: &[PhysicalExprRef],
       config: &ConfigOptions,
   ) -> Result<FilterPushdownResult<Arc<dyn ExecutionPlan>>> {
       match input.try_pushdown_filters(input, parent_filters, config)? {
           FilterPushdownResult::NotPushed => {
               // No pushdown possible, keep this child as is
               Ok(FilterPushdownResult::NotPushed)
           }
           FilterPushdownResult::Pushed {
               updated: inner,
               support,
           } => {
               // We have a child that has pushed down some filters
               let new_inner =
                   with_new_children_if_necessary(Arc::clone(plan), 
vec![inner])?;
               Ok(FilterPushdownResult::Pushed {
                   updated: new_inner,
                   support,
               })
           }
       }
   }
   ```
   
   Here, you will not need to do the recursive call, and the last part of 
with_new_children...() logic. As you've mentioned, maybe a single API wouldn't 
be enough, but perhaps having 2 or 3 is what we really need to comply 
separation of concerns.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

Reply via email to