berkaysynnada commented on PR #15801: URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2846962860
Hi again @adriangb. I've sent the commit, but unfortunately I can't say I fully achieved what I had in mind. Most of the work are idiomatic changes, comments and naming -- I'd still love to hear your thoughts on them. There are a few points I'm not completely comfortable with: 1) New ExecutionPlan APIs: We ended up introducing two new APIs, and I know we can't merge them into one. However, the existing ExecutionPlan APIs are general-purpose, self-explanatory, and simple, whereas these new ones feel harder to understand and are quite specific to the current rule. I initially tried to add something like `filter(&self) -> Arc<dyn PhysicalExpr>` and `try_filter_pushdown(&self, filter: Arc<dyn PhysicalExpr>) -> Self`, but couldn't get it passing all the tests. It seems theoretically possible, but I prefer not spending more time on it at this stage. 2) Related to the first item, I'm also a bit concerned that the two new APIs -- `gather_filters_for_pushdown` and `handle_child_pushdown_result` -- are both doing parts of the pushdown work. The first one analyzes, the second applies changes. This feels a bit confusing, as the responsibilities aren't clearly separated (because I was always thinking the API's like 1-> information of having filter 2-> how do you treat these filters). At least, we should write in detail what each method and parameter means (I couldn't put them into words :/ ) 3) Another thing I couldn't fully understand is why we need to separate the analysis & pushdown work of parent filters vs. self filters. Could you give an example where keeping them separate is necessary? If this distinction isn't critical, it could simplify `handle_child_pushdown_result()`, since we could avoid passing `parent_filters: Vec<Arc<dyn PhysicalExpr>>`, helping address item-1 as well. I don't want to hold up this PR any longer and block your further work, but I wanted to write down my concerns so we can figure out cleaner, simpler ways of achieving the same goal. This is a great feature, and it will be a shame if it ends up being understandable, extendable, or usable by only a few of us 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org