avantgardnerio commented on issue #23194:
URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4810556878

   > I'm not sure we can make one common AQE but we should definitely try.
   
   I'd like to highlight that Coralogix runs Ballista internally for 
distribution, so this PR was definitely written with distribution in mind.
   
   The key architectural trait I'd like to highlight is the original 
"isomorphic scaling" approach that Datafusion & Ballista took from inception: 
the same mechanisms Datafusion uses to spread load across cores is the same 
mechanism that Ballista uses to distribute across executors.
   
   The intent here was to follow that pattern: Stage boundaries inserted by 
these rules would become shuffle writes in Ballista. (The additional parallel 
window function PRs all follow this pattern as well).
   
   So I don't think this is a competing approach, and a `datafusion-aqe` trait 
would just end up being a 3rd effort would diverge rather than converge 
existing approaches. What this PR hopes to do is add the necessary primitives 
within Datafusion to make distributed AQE easier.
   
   I'm not as familiar with `datafusion-distributed`, but I'm going to spend 
the rest of my day comparing the other two implementations and roadmaps with 
what is provided here - the goal being to find commonality between them and 
move it in-repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to