Jefffrey commented on code in PR #8963:
URL: https://github.com/apache/arrow-datafusion/pull/8963#discussion_r1476878567
##########
datafusion/expr/src/accumulator.rs:
##########
@@ -21,11 +21,13 @@ use arrow::array::ArrayRef;
use datafusion_common::{internal_err, DataFusionError, Result, ScalarValue};
use std::fmt::Debug;
-/// Describes an aggregate functions's state.
+/// Tracks an aggregate functions's state.
///
-/// `Accumulator`s are stateful objects that live throughout the
-/// evaluation of multiple rows and aggregate multiple values together
-/// into a final output aggregate.
+/// `Accumulator`s are stateful objects that implement a single group. They
+/// aggregate values from multiple rows together into a final output aggregate.
+///
+/// [`GroupsAccumulator]` is a additional more performant (but also complex)
API
Review Comment:
```suggestion
/// [`GroupsAccumulator]` is an additional more performant (but also
complex) API
```
##########
datafusion/expr/src/groups_accumulator.rs:
##########
@@ -59,12 +59,24 @@ impl EmitTo {
/// `GroupAccumulator` implements a single aggregate (e.g. AVG) and
/// stores the state for *all* groups internally.
///
+/// # Notes on Implementing `GroupAccumulator`
+///
+/// All aggregates must first implement the simpler [`Accumulator`] trait,
which
+/// handles state for a single group. Implementing `GroupsAccumulator` is
+/// optional and is harder to implement than `Accumulator`, but can be much
+/// faster for queries with many group values. See the [Aggregating Millions
of
+/// Groups Fast blog] for more background.
+///
+/// # Details
/// Each group is assigned a `group_index` by the hash table and each
-/// accumulator manages the specific state, one per group_index.
+/// accumulator manages the specific state, one per `group_index`.
///
/// group_indexes are contiguous (there aren't gaps), and thus it is
Review Comment:
```suggestion
/// `group_index`es are contiguous (there aren't gaps), and thus it is
```
##########
datafusion/expr/src/accumulator.rs:
##########
@@ -21,11 +21,13 @@ use arrow::array::ArrayRef;
use datafusion_common::{internal_err, DataFusionError, Result, ScalarValue};
use std::fmt::Debug;
-/// Describes an aggregate functions's state.
+/// Tracks an aggregate functions's state.
Review Comment:
```suggestion
/// Tracks an aggregate function's state.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]