[GitHub] [arrow] chungg commented on a change in pull request #11624: ARROW-14608: [Python] Provide access to hash_aggregate functions through a Table.group_by method

GitBox Tue, 16 Nov 2021 13:19:29 -0800


chungg commented on a change in pull request #11624:
URL: https://github.com/apache/arrow/pull/11624#discussion_r750400880




##########
File path: python/pyarrow/table.pxi
##########
@@ -2192,6 +2192,83 @@ cdef class Table(_PandasConvertible):
 
         return table
 
+    def group_by(self, keys, columns, aggregations):
+        """
+        Perform a group by aggregation over the columns of the table.
+
+        Parameters
+        ----------
+        keys : str or list[str]
+            Name of the columns that should be used as the grouping key.
+        columns : list of str
+            Names of the columns that contain values for the aggregations.
+        aggregations : str or list[str] or list of tuple(str, FunctionOptions)
+            Name of the hash aggregation function, or list of aggregation
+            function names or list of aggregation function names together
+            with their options.
+
+        Returns
+        -------
+        Table
+            Results of the aggregation functions.
+        """
+        if isinstance(aggregations, str):
+            aggregations = [aggregations]
+
+        if isinstance(keys, str):
+            keys = [keys]
+
+        aggrs = []
+        for aggr in aggregations:
+            if isinstance(aggr, str):
+                aggr = (aggr, None)
+            if not aggr[0].startswith("hash_"):
+                aggr = ("hash_" + aggr[0], aggr[1])
+            aggrs.append(aggr)
+
+        # Build unique names for aggregation result columns
+        # so that it's obvious what they refer to.
+        column_names = []
+        for value_name, (aggr_name, _) in zip(columns, aggrs):
+            column_names.append(aggr_name.replace("hash", value_name))
+        for key_name in keys:
+            column_names.append(key_name)

Review comment:
       nit
   ```suggestion
           # Build unique names for aggregation result columns
           # so that it's obvious what they refer to.
           column_names = [aggr_name.replace("hash", value_name)
                           for value_name, (aggr_name, _) in zip(columns, 
aggrs)] + keys
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] chungg commented on a change in pull request #11624: ARROW-14608: [Python] Provide access to hash_aggregate functions through a Table.group_by method

Reply via email to