Re: [I] Fusing partial aggregation with repartition [datafusion]

via GitHub Tue, 24 Sep 2024 17:02:42 -0700


yjshen commented on issue #12596:
URL: https://github.com/apache/datafusion/issues/12596#issuecomment-2372599393


   > Introduce the partitioned hashtable in partial aggregation, and we 
partition the datafusion before inserting them into hashtable.
   > And we push them into final aggregation partition by partition after, 
rather than split them again in repartition, and merge them again in coalesce.
   
   I'm not clear on how this proposal works. Could you please explain why it 
provides performance benefits compared to partial aggregation, exchange, and 
final aggregation? Is the proposal aimed explicitly at accelerating high 
cardinality aggregation, or is it intended to enhance aggregation performance?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Fusing partial aggregation with repartition [datafusion]

Reply via email to