avantgardnerio commented on issue #23194: URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4810556878
> I'm not sure we can make one common AQE but we should definitely try. I'd like to highlight that Coralogix runs Ballista internally for distribution, so this PR was definitely written with distribution in mind. The key architectural trait I'd like to highlight is the original "isomorphic scaling" approach that Datafusion & Ballista took from inception: the same mechanisms Datafusion uses to spread load across cores is the same mechanism that Ballista uses to distribute across executors. The intent here was to follow that pattern: Stage boundaries inserted by these rules would become shuffle writes in Ballista. (The additional parallel window function PRs all follow this pattern as well). So I don't think this is a competing approach, and a `datafusion-aqe` trait would just end up being a 3rd effort would diverge rather than converge existing approaches. What this PR hopes to do is add the necessary primitives within Datafusion to make distributed AQE easier. I'm not as familiar with `datafusion-distributed`, but I'm going to spend the rest of my day comparing the other two implementations and roadmaps with what is provided here - the goal being to find commonality between them and move it in-repo. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
