icexelloss commented on code in PR #35514:
URL: https://github.com/apache/arrow/pull/35514#discussion_r1221691124


##########
python/pyarrow/src/arrow/python/udf.cc:
##########
@@ -234,6 +351,56 @@ Status RegisterTabularFunction(PyObject* user_function, 
UdfWrapperCallback wrapp
       wrapper, options, registry);
 }
 
+Status AddAggKernel(std::shared_ptr<compute::KernelSignature> sig, 
compute::KernelInit init,
+                           compute::ScalarAggregateFunction* func) {
+
+  compute::ScalarAggregateKernel kernel(std::move(sig), std::move(init), 
AggregateUdfConsume, AggregateUdfMerge, AggregateUdfFinalize, 
/*ordered=*/false);
+  RETURN_NOT_OK(func->AddKernel(std::move(kernel)));
+  return Status::OK();
+}
+
+Status RegisterAggregateFunction(PyObject* agg_function, UdfWrapperCallback 
agg_wrapper,
+                                               const UdfOptions& options,
+                                               compute::FunctionRegistry* 
registry) {
+  if (!PyCallable_Check(agg_function)) {
+    return Status::TypeError("Expected a callable Python object.");
+  }
+
+  if (registry == NULLPTR) {
+    registry = compute::GetFunctionRegistry();
+  }
+
+  static auto default_scalar_aggregate_options = 
compute::ScalarAggregateOptions::Defaults();
+  auto aggregate_func = std::make_shared<compute::ScalarAggregateFunction>(
+      options.func_name, options.arity, options.func_doc, 
&default_scalar_aggregate_options);
+
+  Py_INCREF(agg_function);

Review Comment:
   I looked into this more and observed the following.
   
   The refcount of Python function is 2 before registering with Arrow, and is 3 
after registering. After invoking the function with arrow compute, the refcount 
doesn't increase. (still 3) So I believe the code is correct.
   
   On the other hand, I did observe scalar UDF might have a refcount leaking (I 
observed the refcount increases by 1 after calling the function with arrow 
compute). Which I can look into separately.
   
   In summary, my understanding is that every time we wrap the PyObject* into a 
OwnedRef, we need to increase the refcount by 1 so the refcount doesn't change 
after the destructor of OwnedRef gets invoked. I am 90% certain this is correct 
understanding of how it should work but happy to hear more thoughts @westonpace 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to