lidavidm commented on a change in pull request #10887:
URL: https://github.com/apache/arrow/pull/10887#discussion_r690459001



##########
File path: docs/source/cpp/compute.rst
##########
@@ -234,6 +238,88 @@ Notes:
 
 * \(6) Output is Float64 or input type, depending on QuantileOptions.
 
+* \(7) tdigest/t-digest computes approximate quantiles, and so only needs a
+  fixed amount of memory. See the `reference implementation
+  <https://github.com/tdunning/t-digest>`_ for details.
+
+Grouped Aggregations ("group by")
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Grouped aggregations are not directly invokable, but are used as part of a
+group by operation. Like scalar aggregations, grouped aggregations reduce
+multiple input values to a single output value. Instead of aggregating all
+values of the input, however, grouped aggregations partition of the input
+values on some set of "key" columns, then aggregate each group individually,
+and emit one output per input group.
+
+As an example, for the following table:
+
++-----------------+--------------+
+| Column "x"      | Column "key" |
++=================+==============+
+| 2               | "a"          |
++-----------------+--------------+
+| 5               | "a"          |
++-----------------+--------------+
+| null            | "b"          |
++-----------------+--------------+
+| null            | "b"          |
++-----------------+--------------+
+| null            | null         |
++-----------------+--------------+
+| 5               | null         |
++-----------------+--------------+
+
+We compute a sum of column "x", grouped on the key column "key". This gives us
+three groups:
+
++-----------------+--------------+
+| Column "sum(x)" | Column "key" |
++=================+==============+
+| 7               | "a"          |
++-----------------+--------------+
+| null            | "b"          |
++-----------------+--------------+
+| 5               | null         |
++-----------------+--------------+
+
+The supported aggregation functions are as follows.
+
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| Function name | Arity | Input types | Output type    | Options class         
           | Notes |
++===============+=======+=============+================+==================================+=======+
+| hash_all      | Unary | Boolean     | Scalar Int64   | 
:struct:`ScalarAggregateOptions` | \(1)  |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| hash_any      | Unary | Any         | Scalar Int64   | 
:struct:`ScalarAggregateOptions` | \(1)  |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| hash_count    | Unary | Boolean     | Scalar Int64   | 
:struct:`CountOptions`           | \(2)  |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| hash_mean     | Unary | Numeric     | Scalar Float64 |                       
           |       |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| hash_min_max  | Unary | Numeric     | Scalar Struct  | 
:struct:`ScalarAggregateOptions` | \(3)  |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| hash_stddev   | Unary | Numeric     | Scalar Float64 | 
:struct:`VarianceOptions`        |       |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| hash_sum      | Unary | Numeric     | Scalar Numeric |                       
           |       |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| hash_tdigest  | Unary | Numeric     | Scalar Float64 | 
:struct:`TDigestOptions`         | \(4)  |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+| hash_variance | Unary | Numeric     | Scalar Float64 | 
:struct:`VarianceOptions`        |       |
++---------------+-------+-------------+----------------+----------------------------------+-------+
+
+* \(1) If null values are taken into account, by setting the
+  ScalarAggregateOptions parameter skip_nulls = false, then `Kleene logic`_

Review comment:
       hash_any/hash_all already supported skip_nulls, but not min_count.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to