Wes McKinney created ARROW-4124:
-----------------------------------
Summary: [C++] Abstract aggregation kernel API
Key: ARROW-4124
URL: https://issues.apache.org/jira/browse/ARROW-4124
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Wes McKinney
Fix For: 0.13.0
Related to the particular details of implementing various aggregation types, we
should first put a bit of energy into the abstract API for aggregating data in
a multi-threaded setting
Aggregators must support both hash/group (e.g. "group by" in SQL or data frame
libraries) modes and non-group modes.
Aggregations ideally should also support filter pushdown. For example:
{code}
select $AGG($EXPR)
from $TABLE
where $PREDICATE
{code}
Some systems might materialize the post-predicate / filtered version of
{{$EXPR}}, then aggregate that. pandas does this for example. Vectorized
performance can be much improved by filtering inside the aggregation kernel.
How the predicate true/false values are handled may depend on the
implementation details of the kernel (e.g. SUM or MEAN will be a bit different
from PRODUCT)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)