gabotechs commented on issue #23194: URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4842226498
Answering now to https://github.com/apache/datafusion/issues/23194#issuecomment-4838546624: > I'm trying to compile a list of the ongoing AQE-like efforts, inside the existing DF framework, including my own: > - adaptive agg ... > - parallel sort ... > - halo rows ... > - prefix scan ... All those PRs ship some form of adaptation local to individual nodes, without reshaping the topology of the execution plan at runtime (correct me if I'm wrong), which I guess they still fit the definition of "adaptive", but it's a different topic from what `ballista` and `datafusion-distributed` in their parallel efforts towards AQE aim for: reshaping the plan topology at execution time for fragments of the plan that still have not began execution. For reshaping the plan topology at execution time, good idempotent `PhysicalOptimizerRule` implementations that can be re-run on an already optimized plan are key for any system that has a way of collecting runtime `datafusion::physical_plan::Statistics`, this being `ballista`, `datafusion-distributed`, or `datafusion` itself. I bet that having those in `datafusion` would be nice. Now, the mechanism that splits the plan in fragments and collects runtime `datafusion::physical_plan::Statistics` for the already executed fragments... I don't see how we can come up with an abstraction general enough that it's worth hosting in `datafusion`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
