alamb commented on issue #5646:
URL: 
https://github.com/apache/arrow-datafusion/issues/5646#issuecomment-1628654865

   BTW I think the reason DF's memory usage is increasing with number of cores 
is because the first partial aggregate phase is using RoundRobin repartitioning 
(and thus each hash table has an entry for all the groups). 
   
   To avoid this, we would need to hash repartition the input based on group 
keys so the different partitions saw different subsets of the group keys


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to