bkietz commented on a change in pull request #9621: URL: https://github.com/apache/arrow/pull/9621#discussion_r592436887
########## File path: cpp/src/arrow/compute/api_aggregate.h ########## @@ -306,5 +326,34 @@ Result<Datum> TDigest(const Datum& value, const TDigestOptions& options = TDigestOptions::Defaults(), ExecContext* ctx = NULLPTR); +/// \brief Calculate multiple aggregations grouped on multiple keys +/// +/// \param[in] aggregands datums to which aggregations will be applied +/// \param[in] keys datums which will be used to group the aggregations +/// \param[in] options GroupByOptions, encapsulating the names and options of aggregate +/// functions to be applied and the field names for results in the output. +/// \return a StructArray with len(aggregands) + len(keys) fields. The first +/// len(aggregands) fields are the results of the aggregations for the group +/// specified by keys in the final len(keys) fields. +/// +/// For example: +/// GroupByOptions options = { +/// .aggregates = { +/// {"sum", nullptr, "sum result"}, +/// {"mean", nullptr, "mean result"}, +/// }, +/// .key_names = {"str key", "date key"}, +/// }; +/// assert(*GroupBy({[2, 5, 8], [1.5, 2.0, 3.0]}, +/// {["a", "b", "a"], [today, today, today]}, +/// options).Equals([ +/// {"sum result": 10, "mean result": 2.25, "str key": "a", "date key": today}, +/// {"sum result": 5, "mean result": 2.0, "str key": "b", "date key": today}, +/// ])) Review comment: Since the group id lists are temporary (except in the rare case where we need to partition batches for writing), we will be computing and discarding them on the fly rather than materializing an O(N) set of them. I'll be removing this compute function; as mentioned above it's not necessary for group by to live in the function registry. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org