milenkovicm commented on issue #23194:
URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4860174791

   Rather than prescribing AQE implementation should the focus be on providing 
building blocks to implement AQE planners?
    
   From perspective of ballista AQE the hard part was to make "event driven" 
core which would re-plan stages based on statistics collected at the stage 
boundaries. 
   
   Significant discrepancy from datafusion execution is that datafusion makes 
quite a lot decisions before plan starts  (eg. partitioning, or join 
implementation), AQE would benefit from differing such decisions for later in 
the flow. https://www.cs.cmu.edu/~15721-f24/papers/AQP_in_Lakehouse.pdf 
mentions that spark converts parts of logical to physical plan when needed 
rather than on execution start, I believe similar approach would simplify AQE 
implementation (ballista implements DynamicJoin to defer some partitioning and 
join decisions, which i see as a hack) 
   
   IMHO having a planner which could act on execution events would be the first 
step 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to