Weston Pace created ARROW-16549:
-----------------------------------
Summary: [C++] Simplify AggregateNodeOptions aggregates/targets
Key: ARROW-16549
URL: https://issues.apache.org/jira/browse/ARROW-16549
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Weston Pace
Currently AggregateNodeOptions is:
{noformat}
class ARROW_EXPORT AggregateNodeOptions : public ExecNodeOptions {
public:
// aggregations which will be applied to the targetted fields
std::vector<internal::Aggregate> aggregates;
// fields to which aggregations will be applied
std::vector<FieldRef> targets;
// output field names for aggregations
std::vector<std::string> names;
// keys by which aggregations will be grouped
std::vector<FieldRef> keys;
};
{noformat}
It is not very obvious how {{aggregates}} and {{targets}} are related. My
initial read of the comments led me to think that each aggregate would be
applied to each target and you would end up with {{len(aggregates) *
len(targets)}} output fields. In reality the {{aggregate}} at index {{i}} only
applies to the {{target}} at index {{i}}. It would be simpler to add a
{{FieldRef target}} to {{internal::Aggregate}} (and {{Aggregate}} should not be
{{internal}}).
Alternatively, the entire {{internal::Aggregate}} could be replaced by a "call"
{{arrow::compute::Expression}}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)