Re: [PR] refactor filter pushdown apis [datafusion]

via GitHub Thu, 24 Apr 2025 06:09:54 -0700


berkaysynnada commented on PR #15801:
URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2827570314


    > 1. Not requiring downcast matching of specific `ExecutionPlan`s (as 
discussed previously this could be resolved by adding a new method, but in the 
end both approaches require two methods).
   
   We can actually do that with the current version in a very similar way, and 
I think the API should just look like:
   ```rust
   /// Gets the fetch count for the operator. `None` means there is no fetch.
   fn fetch(&self) -> Option<usize> {
       None
   }
   ```
   Similarly, we could just define `fn filter(&self) -> Option<Arc<dyn 
PhysicalExpr>>`
   
   > 2. Doing less invasive modification of the plans (e.g. not popping and 
then re-inserting FilterExecs).
   
   That’s kind of the nature of transform APIs. I don’t think it’s a big deal. 
When the algorithm encounters a FilterExec, it either tries to push it down, or 
takes a conservative path and only removes it when it’s certain that it should 
be removed. Either way, we're biasing the rule, and if the outcome doesn’t go 
our way, the cost is just a small overhead.
   
   > 3. Reducing the algorithmic complexity of the optimizer rule (it's now a 
single pass over the tree for all cases, and avoids cloning unless strictly 
needed).
   
   The filter() API also enables us to convert the current algorithm to a 
single-pass one.
   
   > 4. Allowing plans to decide how any of the leftover filters from parents 
or that they generated should be handled (I think this will be useful for TopK).
   
   That’s still possible with the current approach.
   
   > 
   > This is in some ways simpler and in some ways more complex than what we 
have right now. The difference is not large in terms of LOC, I think the rest 
boils down to personal preference.
   
   What you’ve done here works very well, but it seems a bit more complicated, 
and I think folks who aren’t deep in this context might find it harder to 
follow. As I mentioned, I prefer to make things more explicit and easier to 
understand in these early implementation stages. Once the requirements and 
implementations are stable, we can shift focus to optimizations.
   
   I'm planning to open a PR which brings in the revisit recursion variant and 
introducing the filter() API. With those, I hope to address your concerns. If I 
don't get something wrong, these issues don't block any implementation, but you 
are only worried about some design nuances and performance?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] refactor filter pushdown apis [datafusion]

Reply via email to