[GitHub] [arrow-datafusion] alamb commented on issue #5646: TPCH, Query 18 and 17 very slow

via GitHub Mon, 10 Jul 2023 03:18:53 -0700


alamb commented on issue #5646:
URL: 
https://github.com/apache/arrow-datafusion/issues/5646#issuecomment-1628654865


   BTW I think the reason DF's memory usage is increasing with number of cores 
is because the first partial aggregate phase is using RoundRobin repartitioning 
(and thus each hash table has an entry for all the groups). 
   
   To avoid this, we would need to hash repartition the input based on group 
keys so the different partitions saw different subsets of the group keys


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #5646: TPCH, Query 18 and 17 very slow

Reply via email to