Re: [I] Implement cache-efficient partial aggregation by Leis et al [datafusion]

via GitHub Sat, 07 Mar 2026 10:02:28 -0800


Rachelint commented on issue #20773:
URL: https://github.com/apache/datafusion/issues/20773#issuecomment-4017032783


   As I see, one of the most obviously bottleneck of aggregation in datafuison 
is `RepartitionExec`.
   
   The partitioned hashtable approach maybe can help, I did an experiment about 
it before:
   https://github.com/apache/datafusion/pull/12526
   The performance improvement got from `removing RepartitionExec`.
   
   But taking consider with `skip partial aggregations`, I think it not the 
good and general way to solve the performance problem of `RepartitionExec`. 
   
   I think #15383 may be the better approach about it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Implement cache-efficient partial aggregation by Leis et al [datafusion]

Reply via email to