[GitHub] [arrow] alamb commented on pull request #8553: ARROW-10366: [Rust][DataFusion] Do not buffer intermediate results in merge or HashAggregate

GitBox Tue, 10 Nov 2020 12:04:14 -0800


alamb commented on pull request #8553:
URL: https://github.com/apache/arrow/pull/8553#issuecomment-724935395



   @jorgecarleitao -- when I ran the TPCH benchmark Q1 locally on my machine, I 
found it kept all my cores busy and the memory profile was low. Thus the 
improvements offered by this PR of running more things in parallel and avoiding 
buffering don't address the (current) bottleneck in the TPCH query
   
   I theorize (without presenting evidence yet) that this is due to how the 
query is running and that:
   1. Runtime is  dominated by filtering (not aggregation)
   2. The intermediate results are small (the output of the first stage of 
grouping is 4 rows)
   
   I have a more detailed performance analysis half written up but it is not 
coherently enough to share yet . I am hoping for some time later this week to 
write it up
   
   I (selfishly) think this is an improvement, but I don't have any performance 
numbers to back that up


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] alamb commented on pull request #8553: ARROW-10366: [Rust][DataFusion] Do not buffer intermediate results in merge or HashAggregate

Reply via email to