pitrou commented on a change in pull request #10887: URL: https://github.com/apache/arrow/pull/10887#discussion_r689519373
########## File path: docs/source/cpp/compute.rst ########## @@ -230,10 +234,64 @@ Notes: Note that the output can have less than *N* elements if the input has less than *N* distinct values. + The mode kernel is not a proper aggregate (it is actually a vector + function, see below). + * \(5) Output is Int64, UInt64 or Float64, depending on the input type. * \(6) Output is Float64 or input type, depending on QuantileOptions. + The quantile kernel is not a proper aggregate (it is actually a vector + function, see below). + +* \(6) tdigest/t-digest computes approximate quantiles, and so only needs a + fixed amount of memory. See the `reference implementation + <https://github.com/tdunning/t-digest>`_ for details. + +Hash Aggregations ("group by") Review comment: I don't know if we want to say "grouped aggregation" rather than "hash aggregation". The former describes the semantics, the latter the implementation. cc @wesm @nealrichardson @ianmcook for opinions. ########## File path: docs/source/cpp/compute.rst ########## @@ -230,10 +234,64 @@ Notes: Note that the output can have less than *N* elements if the input has less than *N* distinct values. + The mode kernel is not a proper aggregate (it is actually a vector + function, see below). + * \(5) Output is Int64, UInt64 or Float64, depending on the input type. * \(6) Output is Float64 or input type, depending on QuantileOptions. + The quantile kernel is not a proper aggregate (it is actually a vector + function, see below). + +* \(6) tdigest/t-digest computes approximate quantiles, and so only needs a + fixed amount of memory. See the `reference implementation + <https://github.com/tdunning/t-digest>`_ for details. + +Hash Aggregations ("group by") +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Hash aggregations are not directly invokable, but are used as part of a group +by operation. Like scalar aggregations, hash aggregations reduce their input +to a single output value, but do so on subsets of the input, based on a +partitioning of the input values on some set of "key" columns, and emit one +output per input group. Review comment: Since it is not trivial by reading this description, it may be good to give a simplistic example (for example, calculating a sum while grouping by a single key). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org