adriangb commented on PR #15566:
URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2789742021

   @berkaysynnada the original implementation had recursion in the optimizer 
rule. It was @alamb suggestion to do it this way, which mirrors the existing 
`try_swap_with_projection` method.
   
   The TLDR is that because each filter may be allowed or not, and maybe 
transformed, to be pushed down into each child we end up with a matrix of 
filters x children that we need to ask the node for. For example, imagine 
joins, each of its children needs to be treated differently and only some 
filters can be pushed into some children. That meant that as you suggest it had 
to be at least 2 methods on ExecutionPlan (2 with complex APIs or 3 simpler 
ones). The recursion is also a bit wonky because you need to:
   1. Recurse down and on your way up transmit not only the new nodes but also 
the filter support result.
   2. There's jumps, eg when you hit a node that doesn't support pushdown but 
still want to recurse into sub-trees.
   3. You're carrying around non-trivial context as you go down.
   
   Having implemented it both ways I've come to the conclusion that trying to 
generalize the recursion into the optimizer rules makes things harder for both 
the simple cases (FilterExec, RepartitionExec) and the complex cases 
(HashJoinExec, ProjectionExec). Doing it this way makes the simple 
implementations trivial and the complex ones will be complex but will be self 
contained and not have to fit into some rigid APIs.
   
   All said and done since the implementation is a lot simpler this way, 
because there is precedent for it in `try_swap_with_projection` and because I'm 
more confident in this API being flexible enough to handle joins, etc I would 
prefer to keep it as is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to