[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11624: ARROW-14608: [Python] Provide access to hash_aggregate functions through a Table.group_by method

GitBox Wed, 17 Nov 2021 00:33:02 -0800


jorisvandenbossche commented on a change in pull request #11624:
URL: https://github.com/apache/arrow/pull/11624#discussion_r751007180




##########
File path: python/pyarrow/_compute.pyx
##########
@@ -1294,3 +1294,30 @@ class TDigestOptions(_TDigestOptions):
         if not isinstance(q, (list, tuple, np.ndarray)):
             q = [q]
         self._set_options(q, delta, buffer_size, skip_nulls, min_count)
+
+
+def _group_by(args, keys, aggregations):

Review comment:
       > I made it internal because we plan to replace this with the exec 
engine on long term, so I guess that the `Table.group_by` implementation will 
switch to use something different in the future.
   
   The same could be done for a `pyarrow.compute` function? (it doesn't map 1:1 
to a C++ kernel anyway)
   
   For me one reason to put it in the compute functions as a 
`pc.group_by(table, keys, ...)` is to sidestep the 1-step vs 2-step API 
discussion for the method a bit. For a function in compute, I think it's 
totally fine to be a one step function




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11624: ARROW-14608: [Python] Provide access to hash_aggregate functions through a Table.group_by method

Reply via email to