alamb commented on pull request #8553: URL: https://github.com/apache/arrow/pull/8553#issuecomment-724935395
@jorgecarleitao -- when I ran the TPCH benchmark Q1 locally on my machine, I found it kept all my cores busy and the memory profile was low. Thus the improvements offered by this PR of running more things in parallel and avoiding buffering don't address the (current) bottleneck in the TPCH query I theorize (without presenting evidence yet) that this is due to how the query is running and that: 1. Runtime is dominated by filtering (not aggregation) 2. The intermediate results are small (the output of the first stage of grouping is 4 rows) I have a more detailed performance analysis half written up but it is not coherently enough to share yet . I am hoping for some time later this week to write it up I (selfishly) think this is an improvement, but I don't have any performance numbers to back that up ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
