jorisvandenbossche commented on code in PR #13687:
URL: https://github.com/apache/arrow/pull/13687#discussion_r982662806


##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,133 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+
+PyArrow allows defining and registering custom compute functions.
+These functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import numpy as np
+
+   import pyarrow as pa
+   import pyarrow.compute as pc
+
+   function_name = "numpy_gcd"
+   function_docs = {
+         "summary": "Calculates the greatest common divisor",
+         "description":
+            "Given 'x' and 'y' find the greatest number that divides\n"
+            "evenly into both x and y."
+   }
+
+   input_types = {
+      "x" : pa.int64(),
+      "y" : pa.int64()
+   }
+
+   output_type = pa.int64()
+
+   def to_np(val):
+      if isinstance(val, pa.Scalar):
+         return val.as_py()

Review Comment:
   OK, so it seems that we are _still_ passing scalars to the UDF in case of 
mixed scalar / array arguments, like you do in `pc.call_function("numpy_gcd", 
[pa.scalar(27), pa.array([81, 12, 5])])`. 
   
   So my understanding was wrong based on testing the case of two scalars: 
`pc.call_function("numpy_gcd", [pa.scalar(27), pa.scalar(81)])`. That actually 
works by passing two length-1 arrays to the UDF. 
   (and for that reason I assumed we always pass scalars as length-1 arrays to 
the UDF implementation)
   
   Personally, I find this a bit confusing (but I am not familiar enough with 
the internals of the kernels to know if it is easy to always pass arrays to the 
UDF). And at least I would document that the arguments can be scalars, but 
never all of them will be scalar at the same time. 
   (since as I mentioned above, the current example UDF implementation doesn't 
work if it would be passed only scalars)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to