[GitHub] [arrow] westonpace commented on a diff in pull request #36253: GH-36252: [Python] Add non decomposable hash aggregate UDF

via GitHub Wed, 28 Jun 2023 10:49:51 -0700


westonpace commented on code in PR #36253:
URL: https://github.com/apache/arrow/pull/36253#discussion_r1245568309



##########
python/pyarrow/src/arrow/python/udf.cc:
##########
@@ -215,9 +247,164 @@ struct PythonUdfScalarAggregatorImpl : public 
ScalarUdfAggregator {
     return Status::OK();
   }
 
-  UdfWrapperCallback agg_cb;
+  std::shared_ptr<OwnedRefNoGIL> function;
+  UdfWrapperCallback cb;
   std::vector<std::shared_ptr<RecordBatch>> values;
-  std::shared_ptr<OwnedRefNoGIL> agg_function;
+  std::shared_ptr<Schema> input_schema;
+  std::shared_ptr<DataType> output_type;
+};
+
+struct PythonUdfHashAggregatorImpl : public HashUdfAggregator {
+  PythonUdfHashAggregatorImpl(std::shared_ptr<OwnedRefNoGIL> function,
+                              UdfWrapperCallback cb,
+                              std::vector<std::shared_ptr<DataType>> 
input_types,
+                              std::shared_ptr<DataType> output_type)
+      : function(function), cb(std::move(cb)), 
output_type(std::move(output_type)) {
+    Py_INCREF(function->obj());

Review Comment:
   These `INCREF`'s still seem superfluous to me but I don't think it's 
critical.  We could test in a follow-up using temporary function registries to 
see if we are preventing UDF functions from being garbage collected.



##########
python/pyarrow/_compute.pyx:
##########
@@ -2767,6 +2767,9 @@ def register_aggregate_function(func, function_name, 
function_doc, in_types, out
     This is often used with ordered or segmented aggregation where groups
     can be emit before accumulating all of the input data.
 
+    Note that currently size of any input column can not exceed 2 GB limit
+    (all groups combined).

Review Comment:
   ```suggestion
       Note that currently the size of any input column can not exceed 2 GB
       for a single segment (all groups combined).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on a diff in pull request #36253: GH-36252: [Python] Add non decomposable hash aggregate UDF

Reply via email to