berkaysynnada commented on PR #15801:
URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2846962860

   Hi again @adriangb. I've sent the commit, but unfortunately I can't say I 
fully achieved what I had in mind. Most of the work are idiomatic changes, 
comments and naming -- I'd still love to hear your thoughts on them.
   
   There are a few points I'm not completely comfortable with:
   
   1) New ExecutionPlan APIs: We ended up introducing two new APIs, and I know 
we can't merge them into one. However, the existing ExecutionPlan APIs are 
general-purpose, self-explanatory, and simple, whereas these new ones feel 
harder to understand and are quite specific to the current rule.
   I initially tried to add something like `filter(&self) -> Arc<dyn 
PhysicalExpr>` and `try_filter_pushdown(&self, filter: Arc<dyn PhysicalExpr>) 
-> Self`, but couldn't get it passing all the tests. It seems theoretically 
possible, but I prefer not spending more time on it at this stage.
   2) Related to the first item, I'm also a bit concerned that the two new APIs 
-- `gather_filters_for_pushdown` and `handle_child_pushdown_result` -- are both 
doing parts of the pushdown work. The first one analyzes, the second applies 
changes. This feels a bit confusing, as the responsibilities aren't clearly 
separated (because I was always thinking the API's like 1-> information of 
having filter 2-> how do you treat these filters). At least, we should write in 
detail what each method and parameter means (I couldn't put them into words :/ )
   3) Another thing I couldn't fully understand is why we need to separate the 
analysis & pushdown work of parent filters vs. self filters. Could you give an 
example where keeping them separate is necessary?
   If this distinction isn't critical, it could simplify 
`handle_child_pushdown_result()`, since we could avoid passing `parent_filters: 
Vec<Arc<dyn PhysicalExpr>>`, helping address item-1 as well.
   
   I don't want to hold up this PR any longer and block your further work, but 
I wanted to write down my concerns so we can figure out cleaner, simpler ways 
of achieving the same goal. This is a great feature, and it will be a shame if 
it ends up being understandable, extendable, or usable by only a few of us 😄


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to