Daniël Heres created ARROW-11300:
------------------------------------

             Summary: [Rust][DataFusion] Improve hash aggregate performance 
with large number of groups in 
                 Key: ARROW-11300
                 URL: https://issues.apache.org/jira/browse/ARROW-11300
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust - DataFusion
            Reporter: Daniël Heres
         Attachments: image-2021-01-18-13-00-36-685.png

Currently, hash aggregates are performing well when having a small number of 
output groups, but the results on db-benchmark 
[https://github.com/h2oai/db-benchmark/pull/182] test on data with a high 
number of output groups.
[https://github.com/apache/arrow/pull/9234] improved the situation a bit, but 
DataFusion is still much slower than even the slowest result when comparing to 
the published results.

This seems mostly having to do with the way we use individual key/groups.
For each new key, we _take_ the indices of the group, resulting in lots of 
small allocations and cache unfriendliness and other overhead if we have many 
keys with only a small (just 1-2) number of rows per group in a batch. Also the 
indices are converted from a Vec to an Array, making the situation worse 
(accounts for ~22% of the instructions on the master branch!), other profiling 
results seem to be from related allocations too.

To make it efficient for tiny groups, we should probably change the hash 
aggregate algorithm to _take_ based on _all_ indices from the batch in one go, 
and "slice" into the resulting array for the individual accumulators.
 
Here is some profiling info of the db-benchmark questions 1-5 against master:

!image-2021-01-18-13-00-36-685.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to