Ben Kietzman created ARROW-11840:
------------------------------------

             Summary: [C++][Compute] Support merging GroupByState for 
multithreaded aggregation
                 Key: ARROW-11840
                 URL: https://issues.apache.org/jira/browse/ARROW-11840
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
    Affects Versions: 3.0.0
            Reporter: Ben Kietzman


ARROW-11591 adds support for grouped aggregation, but defers merging (which is 
non-trivial and unnecessary for single threaded aggregation). Eventually it 
will be required to support merging, however: when aggregating in a 
multithreaded dataset scan, each thread's results will need to be combined 
after the scan is completed.

Note that currently {{ScalarAggExecutor::Consume}} assumes that merging 
aggregations is not costly (true for small aggregation state as with "mean", 
but false for "group_by"), and invokes {{ScalarAggregateKernel::merge}} for 
each input batch. ARROW-11591 introduces "group_by" as a special case which 
will not be merged for each input batch but Ideally this assumption would not 
be made for any kernel. When removing it, be sure that merging other aggregates 
continues to be tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to