avantgardnerio commented on issue #23194: URL: https://github.com/apache/datafusion/issues/23194#issuecomment-4844950636
Hi @gabotechs , I think your list of concerns is really helpful. I'm usually working with Claude on things, and he provided this helpful mapping of each of them to what's in the PR: <img width="1792" height="400" alt="Image" src="https://github.com/user-attachments/assets/d2d65ead-7ee8-48eb-906b-b73cb0cfb593" /> I think that makes the picture clearer than it's been previously. I've not been at it long, but I think AQE typically gets some baggage associated with it: 1. "AQE is only for batch processes" - I think your work is really innovative and shows this is untrue 2. "AQE is a distribution concern" - I think Andy's work on Datafusion/Ballista (and this PR) show what is good for the distributed goose is also generally good for the local gander. I like what you said about idempotent rules being a keystone, and I see folks have been putting a lot of effort into making them so in upstream, in order to benefit downstream repos. This puts Datafusion in an interesting position of having to maintain the code (and the invariant), but not having it easily testable or reap any direct performance benefits. (yes, there can be unit tests, but contributors must know to write and maintain them and why). What this PR hopes to offer the downstream community is: 1. An operational whitelist of AQE ready optimizer rules, built gradually, over time (idempotent ones might be ready to go!) 2. operators that are AQE compatible 3. _and most importantly in-repo benefit_ so Datafusion contributors have incentive to test & maintain them (performance gains) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
