berkaysynnada commented on PR #15801: URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2827570314
> 1. Not requiring downcast matching of specific `ExecutionPlan`s (as discussed previously this could be resolved by adding a new method, but in the end both approaches require two methods). We can actually do that with the current version in a very similar way, and I think the API should just look like: ```rust /// Gets the fetch count for the operator. `None` means there is no fetch. fn fetch(&self) -> Option<usize> { None } ``` Similarly, we could just define `fn filter(&self) -> Option<Arc<dyn PhysicalExpr>>` > 2. Doing less invasive modification of the plans (e.g. not popping and then re-inserting FilterExecs). That’s kind of the nature of transform APIs. I don’t think it’s a big deal. When the algorithm encounters a FilterExec, it either tries to push it down, or takes a conservative path and only removes it when it’s certain that it should be removed. Either way, we're biasing the rule, and if the outcome doesn’t go our way, the cost is just a small overhead. > 3. Reducing the algorithmic complexity of the optimizer rule (it's now a single pass over the tree for all cases, and avoids cloning unless strictly needed). The filter() API also enables us to convert the current algorithm to a single-pass one. > 4. Allowing plans to decide how any of the leftover filters from parents or that they generated should be handled (I think this will be useful for TopK). That’s still possible with the current approach. > > This is in some ways simpler and in some ways more complex than what we have right now. The difference is not large in terms of LOC, I think the rest boils down to personal preference. What you’ve done here works very well, but it seems a bit more complicated, and I think folks who aren’t deep in this context might find it harder to follow. As I mentioned, I prefer to make things more explicit and easier to understand in these early implementation stages. Once the requirements and implementations are stable, we can shift focus to optimizations. I'm planning to open a PR which brings in the revisit recursion variant and introducing the filter() API. With those, I hope to address your concerns. If I don't get something wrong, these issues don't block any implementation, but you are only worried about some design nuances and performance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org