Ben Kietzman created ARROW-11840:
------------------------------------
Summary: [C++][Compute] Support merging GroupByState for
multithreaded aggregation
Key: ARROW-11840
URL: https://issues.apache.org/jira/browse/ARROW-11840
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Affects Versions: 3.0.0
Reporter: Ben Kietzman
ARROW-11591 adds support for grouped aggregation, but defers merging (which is
non-trivial and unnecessary for single threaded aggregation). Eventually it
will be required to support merging, however: when aggregating in a
multithreaded dataset scan, each thread's results will need to be combined
after the scan is completed.
Note that currently {{ScalarAggExecutor::Consume}} assumes that merging
aggregations is not costly (true for small aggregation state as with "mean",
but false for "group_by"), and invokes {{ScalarAggregateKernel::merge}} for
each input batch. ARROW-11591 introduces "group_by" as a special case which
will not be merged for each input batch but Ideally this assumption would not
be made for any kernel. When removing it, be sure that merging other aggregates
continues to be tested.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)