A. Coady created ARROW-18433:
--------------------------------
Summary: Optimize aggregate functions to work with batches.
Key: ARROW-18433
URL: https://issues.apache.org/jira/browse/ARROW-18433
Project: Apache Arrow
Issue Type: New Feature
Components: C++, Python
Affects Versions: 10.0.1
Reporter: A. Coady
Most compute functions work with the dataset api and don't load columns. But
aggregate functions which are associative could also work: `min`, `max`, `any`,
`all`, `sum`, `product`. Even `unique` and `value_counts`.
A couple of implementation ideas:
* expand the dataset api to support expressions which return scalars
* add a `BatchedArray` type which is like a `ChunkedArray` but with lazy
loading
--
This message was sent by Atlassian Jira
(v8.20.10#820010)