[GitHub] [arrow] pitrou commented on a change in pull request #10887: ARROW-13311: [C++][Documentation] Document hash aggregate kernels

GitBox Mon, 16 Aug 2021 06:06:04 -0700


pitrou commented on a change in pull request #10887:
URL: https://github.com/apache/arrow/pull/10887#discussion_r689519373




##########
File path: docs/source/cpp/compute.rst
##########
@@ -230,10 +234,64 @@ Notes:
   Note that the output can have less than *N* elements if the input has
   less than *N* distinct values.
 
+  The mode kernel is not a proper aggregate (it is actually a vector
+  function, see below).
+
 * \(5) Output is Int64, UInt64 or Float64, depending on the input type.
 
 * \(6) Output is Float64 or input type, depending on QuantileOptions.
 
+  The quantile kernel is not a proper aggregate (it is actually a vector
+  function, see below).
+
+* \(6) tdigest/t-digest computes approximate quantiles, and so only needs a
+  fixed amount of memory. See the `reference implementation
+  <https://github.com/tdunning/t-digest>`_ for details.
+
+Hash Aggregations ("group by")

Review comment:
       I don't know if we want to say "grouped aggregation" rather than "hash 
aggregation". The former describes the semantics, the latter the 
implementation. cc @wesm @nealrichardson @ianmcook  for opinions.

##########
File path: docs/source/cpp/compute.rst
##########
@@ -230,10 +234,64 @@ Notes:
   Note that the output can have less than *N* elements if the input has
   less than *N* distinct values.
 
+  The mode kernel is not a proper aggregate (it is actually a vector
+  function, see below).
+
 * \(5) Output is Int64, UInt64 or Float64, depending on the input type.
 
 * \(6) Output is Float64 or input type, depending on QuantileOptions.
 
+  The quantile kernel is not a proper aggregate (it is actually a vector
+  function, see below).
+
+* \(6) tdigest/t-digest computes approximate quantiles, and so only needs a
+  fixed amount of memory. See the `reference implementation
+  <https://github.com/tdunning/t-digest>`_ for details.
+
+Hash Aggregations ("group by")
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Hash aggregations are not directly invokable, but are used as part of a group
+by operation. Like scalar aggregations, hash aggregations reduce their input
+to a single output value, but do so on subsets of the input, based on a
+partitioning of the input values on some set of "key" columns, and emit one
+output per input group.

Review comment:
       Since it is not trivial by reading this description, it may be good to 
give a simplistic example (for example, calculating a sum while grouping by a 
single key).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] pitrou commented on a change in pull request #10887: ARROW-13311: [C++][Documentation] Document hash aggregate kernels

Reply via email to