vibhatha commented on code in PR #13687:
URL: https://github.com/apache/arrow/pull/13687#discussion_r929466056


##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,129 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs and input types and output 
type need to be defined.
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+.. note::
+   Note that when the passed values to a function are all scalars, internally 
each scalar 
+   is passed as an array of size 1.
+
+More generally, user-defined functions are usable everywhere a compute function
+can be referred to by its name. For example, they can be called on a dataset's
+column using :meth:`Expression._call`:
+Considering a series of scalar inputs,
+
+.. code-block:: python
+
+   >>> import pyarrow as pa
+   >>> import pyarrow.compute as pc
+   >>> function_name = "affine_with_python"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Python",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def affine_with_python(ctx, m, x, c):
+   ...     m = m[0].as_py()
+   ...     x = x[0].as_py()
+   ...     c = c[0].as_py()
+   ...     return pa.array([m * x + c])

Review Comment:
   @pitrou should we keep this example or just keep a note/warning? I am 
inclined towards the example, though. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to