Re: [I] [EPIC] Adaptive Query Execution (AQE) [datafusion-ballista]

via GitHub Sat, 10 Jan 2026 03:17:26 -0800


milenkovicm commented on issue #1359:
URL: 
https://github.com/apache/datafusion-ballista/issues/1359#issuecomment-3732471645


   
   > * Im curious about optimizations like two-phased aggregations, which would 
span stages (local agg in one, global in the next). I’ve seen limited examples 
of adaptive two-phase aggregations where the local agg can be disabled in the 
presence of a large keyspace. Have you given any thought to how the design 
might allow/disallow future extensions for rules that span > 1 stage and aim to 
adapt a running stage? Sounds like such rules wouldn’t be possible with a pass 
after each stage, which is fine, but wanted to make that thought explicit.
   
   I believe 
`datafusion.execution.skip_partial_aggregation_probe_ratio_threshold` controls 
this, in ballista case this would mean it will just write everything to shuffle 
file. Not sure if we could make some kind of optimizer rule which could use 
cardinality from previous stage to help with this. 
   
   Anyway, good part of work to make it easier to experiment with this things
     
   > * For observability sake, do we plan to log out or keep in memory each 
round of optimizations and the original execution graph?
   
   At the moment prototype holds whole plan and traverse whole plane every 
after every stage. It may be good for observability but it may affect planning 
time, thus it may not be the best way forward. At the moment we can keep it as 
it is until we get it running and then consider how to handle planning and 
observability. I do agree that is a bit of a challenge to understand what 
happened with plan. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [EPIC] Adaptive Query Execution (AQE) [datafusion-ballista]

Reply via email to