Re: [PR] Add DecomposeAggregate optimizer to rewrite AVG as SUM/COUNT [datafusion]

via GitHub Wed, 15 Apr 2026 09:24:10 -0700


Dandandan commented on PR #21613:
URL: https://github.com/apache/datafusion/pull/21613#issuecomment-4253714613


   > > Reduces accumulator overhead: AVG stores sum + count per group 
internally; splitting into separate SUM and COUNT accumulators is more efficient
   > 
   > Do we know why this is, BTW?
   > 
   > Would it be better to make Average faster somehow?
   
   The largest improvement comes from having both `COUNT(*)` and `AVG(col)` 
after CSE in a single aggregation then we can reuse the state from count (so 
~halving the state memory usage / allocations / cache misses), i.e. 2 `Vec`s 
instead of 3.
   
   Also sum and count compile to better code as this has a clear translation to 
(simd) instructions whereas count+sum in a single lkop probably does (still 
pretty fast, but...) generate less efficient code.
   
   Perhaps two separate loops (one for counts / one for sums) would improve 
this, but the CSE optimization (reducing state) is the larger improvement.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add DecomposeAggregate optimizer to rewrite AVG as SUM/COUNT [datafusion]

Reply via email to