berkaysynnada commented on PR #15801:
URL: https://github.com/apache/datafusion/pull/15801#issuecomment-2827570314
> 1. Not requiring downcast matching of specific `ExecutionPlan`s (as
discussed previously this could be resolved by adding a new method, but in the
end both approaches require two methods).
We can actually do that with the current version in a very similar way, and
I think the API should just look like:
```rust
/// Gets the fetch count for the operator. `None` means there is no fetch.
fn fetch(&self) -> Option<usize> {
None
}
```
Similarly, we could just define `fn filter(&self) -> Option<Arc<dyn
PhysicalExpr>>`
> 2. Doing less invasive modification of the plans (e.g. not popping and
then re-inserting FilterExecs).
That’s kind of the nature of transform APIs. I don’t think it’s a big deal.
When the algorithm encounters a FilterExec, it either tries to push it down, or
takes a conservative path and only removes it when it’s certain that it should
be removed. Either way, we're biasing the rule, and if the outcome doesn’t go
our way, the cost is just a small overhead.
> 3. Reducing the algorithmic complexity of the optimizer rule (it's now a
single pass over the tree for all cases, and avoids cloning unless strictly
needed).
The filter() API also enables us to convert the current algorithm to a
single-pass one.
> 4. Allowing plans to decide how any of the leftover filters from parents
or that they generated should be handled (I think this will be useful for TopK).
That’s still possible with the current approach.
>
> This is in some ways simpler and in some ways more complex than what we
have right now. The difference is not large in terms of LOC, I think the rest
boils down to personal preference.
What you’ve done here works very well, but it seems a bit more complicated,
and I think folks who aren’t deep in this context might find it harder to
follow. As I mentioned, I prefer to make things more explicit and easier to
understand in these early implementation stages. Once the requirements and
implementations are stable, we can shift focus to optimizations.
I'm planning to open a PR which brings in the revisit recursion variant and
introducing the filter() API. With those, I hope to address your concerns. If I
don't get something wrong, these issues don't block any implementation, but you
are only worried about some design nuances and performance?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]