westonpace commented on code in PR #15083:
URL: https://github.com/apache/arrow/pull/15083#discussion_r1081477937


##########
python/pyarrow/_compute.pyx:
##########
@@ -2202,12 +2202,18 @@ def _group_by(args, keys, aggregations):
     _pack_compute_args(args, &c_args)
     _pack_compute_args(keys, &c_keys)
 
-    for aggr_func_name, aggr_opts in aggregations:
+    # reference into the flattened list of arguments for the aggregations
+    field_ref = 0
+    for aggr_arg_names, aggr_func_name, aggr_opts in aggregations:
         c_aggr.function = tobytes(aggr_func_name)
         if aggr_opts is not None:
             c_aggr.options = (<FunctionOptions?>aggr_opts).wrapped
         else:
             c_aggr.options = <shared_ptr[CFunctionOptions]>nullptr
+        for _ in aggr_arg_names:
+            c_aggr.target.push_back(CFieldRef(<int> field_ref))
+            field_ref += 1

Review Comment:
   I had simplified this in #14867 to just get rid of the field refs but I 
think I'll need to revisit that logic after this merges.  I think it could be 
made a lot simpler.
   
   Today we are going Table -> vector<Array> -> C++ -> Table -> ExecPlan -> 
Table -> python
   
   Because python is giving C++ a vector of arrays there are no "names" and 
named refs wouldn't make sense.  It also means that C++ has to invent names for 
the output columns.  Then python just renames and clobbers those invented names.
   
   I think this would be a lot simpler as just:
   
   Table -> C++ -> ExecPlan -> Table -> python
   
   I'll put #14867 on hold for now and take a look again when this merges.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to