milenkovicm commented on issue #23194:
URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4860174791
Rather than prescribing AQE implementation should the focus be on providing
building blocks to implement AQE planners?
From perspective of ballista AQE the hard part was to make "event driven"
core which would re-plan stages based on statistics collected at the stage
boundaries.
Significant discrepancy from datafusion execution is that datafusion makes
quite a lot decisions before plan starts (eg. partitioning, or join
implementation), AQE would benefit from differing such decisions for later in
the flow. https://www.cs.cmu.edu/~15721-f24/papers/AQP_in_Lakehouse.pdf
mentions that spark converts parts of logical to physical plan when needed
rather than on execution start, I believe similar approach would simplify AQE
implementation (ballista implements DynamicJoin to defer some partitioning and
join decisions, which i see as a hack)
IMHO having a planner which could act on execution events would be the first
step
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]