adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2789742021
@berkaysynnada the original implementation had recursion in the optimizer rule. It was @alamb suggestion to do it this way, which mirrors the existing `try_swap_with_projection` method. The TLDR is that because each filter may be allowed or not, and maybe transformed, to be pushed down into each child we end up with a matrix of filters x children that we need to ask the node for. For example, imagine joins, each of its children needs to be treated differently and only some filters can be pushed into some children. That meant that as you suggest it had to be at least 2 methods on ExecutionPlan (2 with complex APIs or 3 simpler ones). The recursion is also a bit wonky because you need to: 1. Recurse down and on your way up transmit not only the new nodes but also the filter support result. 2. There's jumps, eg when you hit a node that doesn't support pushdown but still want to recurse into sub-trees. 3. You're carrying around non-trivial context as you go down. Having implemented it both ways I've come to the conclusion that trying to generalize the recursion into the optimizer rules makes things harder for both the simple cases (FilterExec, RepartitionExec) and the complex cases (HashJoinExec, ProjectionExec). Doing it this way makes the simple implementations trivial and the complex ones will be complex but will be self contained and not have to fit into some rigid APIs. All said and done since the implementation is a lot simpler this way, because there is precedent for it in `try_swap_with_projection` and because I'm more confident in this API being flexible enough to handle joins, etc I would prefer to keep it as is. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org