westonpace commented on code in PR #13687:
URL: https://github.com/apache/arrow/pull/13687#discussion_r951530500


##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,165 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+vivid combinations of these types. It is important that the UDF author must 
make sure,
+the UDF is defined such that it can handle such combinations well. 
+
+For instance, when the passed values to a function are all scalars, internally
+each scalar is passed as an array of size 1.
+
+To elaborate on this, let's consider a scenario where we have a function
+which computes a scalar `y` value based on scalar inputs 
+`m`, `x` and `c` using python arithmetic operations.
+
+.. code-block:: python
+
+   >>> import pyarrow.compute as pc
+   >>> function_name = "affine_with_python"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Python",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def affine_with_python(ctx, m, x, c):
+   ...     m = m[0].as_py()
+   ...     x = x[0].as_py()
+   ...     c = c[0].as_py()
+   ...     return pa.array([m * x + c])
+   ... 
+   >>> pc.register_scalar_function(affine_with_python,
+   ...                             function_name,
+   ...                         function_docs,
+   ...                             input_types,
+   ...                             output_type)
+   >>> 
+   >>> pc.call_function(function_name, [pa.scalar(10.1), pa.scalar(10.2), 
pa.scalar(20.2)])
+   <pyarrow.DoubleScalar: 123.22>
+
+Note that here the the final output is returned as an array. Depending the 
usage of vivid libraries
+inside the UDF, make sure it is generalized to support the passed input values 
and return suitable
+values. 
+
+Working with Datasets
+---------------------
+
+More generally, user-defined functions are usable everywhere a compute function
+can be referred to by its name. For example, they can be called on a dataset's
+column using :meth:`Expression._call`:
+
+Consider an instance where the data is in a table and you need to create a new 
+column using existing values in a column by using a mathematical formula.
+For instance, let's consider a simple affine operation on values using the
+mathematical expression, `y = mx + c`. We will be re-using the registered 
`affine`
+function.
+
+.. code-block:: python
+
+   >>> import pyarrow.dataset as ds
+   >>> sample_data = {'category': ['A', 'B', 'C', 'D'], 'value': [10.21, 
20.12, 45.32, 15.12]}
+   >>> data_table = pa.Table.from_pydict(sample_data)
+   >>> dataset = ds.dataset(data_table)
+   >>> func_args = [pc.scalar(5.2), ds.field("value"), pc.scalar(2.1)]
+   >>> dataset.to_table(
+   ...             columns={
+   ...                 'projected_value': ds.field('')._call("affine", 
func_args),
+   ...                 'value': ds.field('value'),
+   ...                 'category': ds.field('category')
+   ...             })
+   pyarrow.Table
+   total_amount_projected($): int64
+   total_amount($): int64
+   trip_name: string
+   ----
+   total_amount_projected($): [[52,102,227,77]]
+   total_amount($): [[10,20,45,15]]
+   trip_name: [["A","B","C","D"]]
+
+Here note that the `ds.field('')_call()` returns an expression. The passed 
arguments
+to this function call are expressions not scalar values 
+(i.e `pc.scalar(5.2), ds.field("value"), pc.scalar(2.1)`). This expression is 
evaluated
+when the project operator uses this expression.
+
+Support
+-------
+
+It is defined that the current support is only for scalar functions. 
+A scalar function (:class:`arrow::compute::ScalarFunction`) executes 
elementwise operations
+on arrays or scalars. Generally, the result of such an execution doesn't
+depend on the order of values.
+
+There is a limitation in the support to UDFs in the current API.
+For instance, with project node, if a UDF is used as the compute function,
+it expects the function to be a scalar function. Although, this doesn't stop 
the user
+registering a non-scalar function and using it in a programme. 
+But it could lead to unexpected behaviors or errors when it is applied in such 
occasions. 
+The current UDF support could enhance with the addition of more settings to 
the API (i.e aggregate UDFs).

Review Comment:
   ```suggestion
   Projection Expressions
   ^^^^^^^^^^^^^^^^^^^^^^
   
   In the above example we used an expression to add a new column 
(`total_amount_projected`)
   to our table.  Adding new, dynamically computed, columns to a table is known 
as "projection"
   and there are limitations on what kinds of functions can be used in 
projection expressions.
   
   A projection function must emit a single output value for each input row.  
That output value
   should be calculated entirely from the input row and should not depend on 
any other row.
   For example, the "affine" function that we've been using as an example above 
is a valid
   function to use in a projection.  A "cumulative sum" function would not be a 
valid function
   since the result of each input rows depends on the rows that came before.  A 
"drop nulls"
   function would also be invalid because it doesn't emit a value for some rows.
   ```
   



##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,165 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+vivid combinations of these types. It is important that the UDF author must 
make sure,
+the UDF is defined such that it can handle such combinations well. 

Review Comment:
   ```suggestion
   any combination of these types. It is important that the UDF author ensures
   the UDF can handle such combinations correctly. 
   ```



##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,165 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+vivid combinations of these types. It is important that the UDF author must 
make sure,
+the UDF is defined such that it can handle such combinations well. 
+
+For instance, when the passed values to a function are all scalars, internally
+each scalar is passed as an array of size 1.
+
+To elaborate on this, let's consider a scenario where we have a function
+which computes a scalar `y` value based on scalar inputs 
+`m`, `x` and `c` using python arithmetic operations.
+
+.. code-block:: python
+
+   >>> import pyarrow.compute as pc
+   >>> function_name = "affine_with_python"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Python",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def affine_with_python(ctx, m, x, c):
+   ...     m = m[0].as_py()
+   ...     x = x[0].as_py()
+   ...     c = c[0].as_py()
+   ...     return pa.array([m * x + c])
+   ... 
+   >>> pc.register_scalar_function(affine_with_python,
+   ...                             function_name,
+   ...                         function_docs,
+   ...                             input_types,
+   ...                             output_type)
+   >>> 
+   >>> pc.call_function(function_name, [pa.scalar(10.1), pa.scalar(10.2), 
pa.scalar(20.2)])
+   <pyarrow.DoubleScalar: 123.22>
+
+Note that here the the final output is returned as an array. Depending the 
usage of vivid libraries

Review Comment:
   I'm not sure what you are trying to say with the sentence that starts 
"Depending the..."



##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,129 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs and input types and output 
type need to be defined.
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+.. note::
+   Note that when the passed values to a function are all scalars, internally 
each scalar 
+   is passed as an array of size 1.
+
+More generally, user-defined functions are usable everywhere a compute function
+can be referred to by its name. For example, they can be called on a dataset's
+column using :meth:`Expression._call`:
+Considering a series of scalar inputs,
+
+.. code-block:: python
+
+   >>> import pyarrow as pa
+   >>> import pyarrow.compute as pc
+   >>> function_name = "affine_with_python"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Python",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def affine_with_python(ctx, m, x, c):
+   ...     m = m[0].as_py()
+   ...     x = x[0].as_py()
+   ...     c = c[0].as_py()
+   ...     return pa.array([m * x + c])

Review Comment:
   It is interesting, I think, that a person can define a UDF without using the 
Arrow compute functions at all, that is the most compelling point of the UDF 
feature in my mind since compositions of Arrow compute functions could already 
be done using expressions.
   
   However, it is not clear from the description that this is the purpose of 
this example (is it?).  It's also perhaps not the most motivating example since 
it can be expressed as an Arrow expression.



##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,165 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+vivid combinations of these types. It is important that the UDF author must 
make sure,
+the UDF is defined such that it can handle such combinations well. 
+
+For instance, when the passed values to a function are all scalars, internally
+each scalar is passed as an array of size 1.
+
+To elaborate on this, let's consider a scenario where we have a function
+which computes a scalar `y` value based on scalar inputs 
+`m`, `x` and `c` using python arithmetic operations.
+
+.. code-block:: python
+
+   >>> import pyarrow.compute as pc
+   >>> function_name = "affine_with_python"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Python",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def affine_with_python(ctx, m, x, c):
+   ...     m = m[0].as_py()
+   ...     x = x[0].as_py()
+   ...     c = c[0].as_py()
+   ...     return pa.array([m * x + c])
+   ... 
+   >>> pc.register_scalar_function(affine_with_python,
+   ...                             function_name,
+   ...                         function_docs,
+   ...                             input_types,
+   ...                             output_type)
+   >>> 
+   >>> pc.call_function(function_name, [pa.scalar(10.1), pa.scalar(10.2), 
pa.scalar(20.2)])
+   <pyarrow.DoubleScalar: 123.22>
+
+Note that here the the final output is returned as an array. Depending the 
usage of vivid libraries
+inside the UDF, make sure it is generalized to support the passed input values 
and return suitable
+values. 
+
+Working with Datasets
+---------------------
+
+More generally, user-defined functions are usable everywhere a compute function
+can be referred to by its name. For example, they can be called on a dataset's
+column using :meth:`Expression._call`:
+
+Consider an instance where the data is in a table and you need to create a new 
+column using existing values in a column by using a mathematical formula.
+For instance, let's consider a simple affine operation on values using the
+mathematical expression, `y = mx + c`. We will be re-using the registered 
`affine`
+function.
+
+.. code-block:: python
+
+   >>> import pyarrow.dataset as ds
+   >>> sample_data = {'category': ['A', 'B', 'C', 'D'], 'value': [10.21, 
20.12, 45.32, 15.12]}
+   >>> data_table = pa.Table.from_pydict(sample_data)
+   >>> dataset = ds.dataset(data_table)
+   >>> func_args = [pc.scalar(5.2), ds.field("value"), pc.scalar(2.1)]
+   >>> dataset.to_table(
+   ...             columns={
+   ...                 'projected_value': ds.field('')._call("affine", 
func_args),
+   ...                 'value': ds.field('value'),
+   ...                 'category': ds.field('category')
+   ...             })
+   pyarrow.Table
+   total_amount_projected($): int64
+   total_amount($): int64
+   trip_name: string
+   ----
+   total_amount_projected($): [[52,102,227,77]]
+   total_amount($): [[10,20,45,15]]
+   trip_name: [["A","B","C","D"]]
+
+Here note that the `ds.field('')_call()` returns an expression. The passed 
arguments
+to this function call are expressions not scalar values 
+(i.e `pc.scalar(5.2), ds.field("value"), pc.scalar(2.1)`). This expression is 
evaluated
+when the project operator uses this expression.

Review Comment:
   You say "The passed arguments to this function call are expressions not 
scalar values".
   
   However, `pc.scalar(5.2)` and `pc.scalar(2.1)` look like scalar values.  I'm 
not sure a user will recognize the subtle difference between `pc.scalar(5.2)` 
and `pa.scalar(5.2)` without further explanation.



##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,165 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+vivid combinations of these types. It is important that the UDF author must 
make sure,

Review Comment:
   I think `any` would be better.



##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,165 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+vivid combinations of these types. It is important that the UDF author must 
make sure,
+the UDF is defined such that it can handle such combinations well. 
+
+For instance, when the passed values to a function are all scalars, internally
+each scalar is passed as an array of size 1.
+
+To elaborate on this, let's consider a scenario where we have a function
+which computes a scalar `y` value based on scalar inputs 
+`m`, `x` and `c` using python arithmetic operations.
+
+.. code-block:: python
+
+   >>> import pyarrow.compute as pc
+   >>> function_name = "affine_with_python"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Python",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def affine_with_python(ctx, m, x, c):
+   ...     m = m[0].as_py()
+   ...     x = x[0].as_py()
+   ...     c = c[0].as_py()
+   ...     return pa.array([m * x + c])
+   ... 
+   >>> pc.register_scalar_function(affine_with_python,
+   ...                             function_name,
+   ...                         function_docs,

Review Comment:
   ```suggestion
      ...                             function_docs,
   ```



##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,165 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+vivid combinations of these types. It is important that the UDF author must 
make sure,
+the UDF is defined such that it can handle such combinations well. 
+
+For instance, when the passed values to a function are all scalars, internally
+each scalar is passed as an array of size 1.
+
+To elaborate on this, let's consider a scenario where we have a function
+which computes a scalar `y` value based on scalar inputs 
+`m`, `x` and `c` using python arithmetic operations.
+
+.. code-block:: python
+
+   >>> import pyarrow.compute as pc
+   >>> function_name = "affine_with_python"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Python",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def affine_with_python(ctx, m, x, c):
+   ...     m = m[0].as_py()
+   ...     x = x[0].as_py()
+   ...     c = c[0].as_py()
+   ...     return pa.array([m * x + c])
+   ... 
+   >>> pc.register_scalar_function(affine_with_python,
+   ...                             function_name,
+   ...                         function_docs,
+   ...                             input_types,
+   ...                             output_type)
+   >>> 
+   >>> pc.call_function(function_name, [pa.scalar(10.1), pa.scalar(10.2), 
pa.scalar(20.2)])
+   <pyarrow.DoubleScalar: 123.22>
+
+Note that here the the final output is returned as an array. Depending the 
usage of vivid libraries

Review Comment:
   Is it an array?  I see:
   
   ```
   <pyarrow.DoubleScalar: 123.22>
   ```
   
   It probably should be an array.



##########
docs/source/python/compute.rst:
##########
@@ -370,3 +370,165 @@ our ``even_filter`` with a ``pc.field("nums") > 5`` 
filter:
 
 :class:`.Dataset` currently can be filtered using :meth:`.Dataset.to_table` 
method
 passing a ``filter`` argument. See :ref:`py-filter-dataset` in Dataset 
documentation.
+
+
+User-Defined Functions
+======================
+
+.. warning::
+   This API is **experimental**.
+   Also, only scalar functions can currently be user-defined.
+
+PyArrow allows defining and registering custom compute functions in Python.
+Those functions can then be called from Python as well as C++ (and potentially
+any other implementation wrapping Arrow C++, such as the R ``arrow`` package`)
+using their registered function name.
+
+To register a UDF, a function name, function docs, input types and
+output type need to be defined. Using 
:func:`pyarrow.compute.register_scalar_function`,
+
+.. code-block:: python
+
+   import pyarrow.compute as pc
+   function_name = "affine"
+   function_docs = {
+      "summary": "Calculate y = mx + c",
+      "description":
+          "Compute the affine function y = mx + c.\n"
+          "This function takes three inputs, m, x and c, in order."
+   }
+   input_types = {
+      "m" : pa.float64(),
+      "x" : pa.float64(),
+      "c" : pa.float64(),
+   }
+   output_type = pa.float64()
+
+   def affine(ctx, m, x, c):
+       temp = pc.multiply(m, x, memory_pool=ctx.memory_pool)
+       return pc.add(temp, c, memory_pool=ctx.memory_pool)
+
+   pc.register_scalar_function(affine, 
+                               function_name,
+                               function_docs,
+                               input_types,
+                               output_type)
+
+The implementation of a user-defined function always takes a first *context*
+parameter (named ``ctx`` in the example above) which is an instance of
+:class:`pyarrow.compute.ScalarUdfContext`.
+This context exposes several useful attributes, particularly a
+:attr:`~pyarrow.compute.ScalarUdfContext.memory_pool` to be used for
+allocations in the context of the user-defined function.
+
+You can call a user-defined function directly using 
:func:`pyarrow.compute.call_function`:
+
+.. code-block:: python
+
+   >>> pc.call_function("affine", [pa.scalar(2.5), pa.scalar(10.5), 
pa.scalar(5.5)])
+   <pyarrow.DoubleScalar: 31.75>
+
+Generalizing Usage
+------------------
+
+PyArrow UDFs accept input types of both scalar and array. Also it can have
+vivid combinations of these types. It is important that the UDF author must 
make sure,
+the UDF is defined such that it can handle such combinations well. 
+
+For instance, when the passed values to a function are all scalars, internally
+each scalar is passed as an array of size 1.
+
+To elaborate on this, let's consider a scenario where we have a function
+which computes a scalar `y` value based on scalar inputs 
+`m`, `x` and `c` using python arithmetic operations.
+
+.. code-block:: python
+
+   >>> import pyarrow.compute as pc
+   >>> function_name = "affine_with_python"
+   >>> function_docs = {
+   ...        "summary": "Calculate y = mx + c with Python",
+   ...        "description":
+   ...            "Compute the affine function y = mx + c.\n"
+   ...            "This function takes three inputs, m, x and c, in order."
+   ... }
+   >>> input_types = {
+   ...    "m" : pa.float64(),
+   ...    "x" : pa.float64(),
+   ...    "c" : pa.float64(),
+   ... }
+   >>> output_type = pa.float64()
+   >>> 
+   >>> def affine_with_python(ctx, m, x, c):
+   ...     m = m[0].as_py()
+   ...     x = x[0].as_py()
+   ...     c = c[0].as_py()
+   ...     return pa.array([m * x + c])
+   ... 
+   >>> pc.register_scalar_function(affine_with_python,
+   ...                             function_name,
+   ...                         function_docs,
+   ...                             input_types,
+   ...                             output_type)
+   >>> 
+   >>> pc.call_function(function_name, [pa.scalar(10.1), pa.scalar(10.2), 
pa.scalar(20.2)])
+   <pyarrow.DoubleScalar: 123.22>

Review Comment:
   The function correctly handles the all-scalar case but it does not handle 
other cases.  Ideally, an example should demonstrate how to write a UDF that 
can handle all possible cases.  For example:
   
   ```
   print('10.0 * 5.0 + 1.0 should be 51.0')
   print(f'Answer={pc.call_function(function_name, [pa.scalar(10.0), 
pa.scalar(5.0), pa.scalar(1.0)])}')
   
   print('[10.0, 10.0] * [5.0, 6.0] + [1.0, 1.0] should be [51.0, 61.0]')
   print(f'Answer={pc.call_function(function_name, [pa.array([10.0, 10.0]), 
pa.array([5.0, 6.0]), pa.array([1.0, 1.0])])}')
   
   print('10.0 * [5.0, 6.0] + 1.0 should be [51.0, 61.0]')
   print(f'Answer={pc.call_function(function_name, [pa.scalar(10.0), 
pa.array([5.0, 6.0]), pa.scalar(1.0)])}')
   ```
   
   Right now, the function as it is designed, gives me this output:
   
   ```
   10.0 * 5.0 + 1.0 should be 51.0
   Answer=51.0
   [10.0, 10.0] * [5.0, 6.0] + [1.0, 1.0] should be [51.0, 61.0]
   Answer=[
     51
   ]
   10.0 * [5.0, 6.0] + 1.0 should be [51.0, 61.0]
   Traceback (most recent call last):
     File "/home/pace/experiments/arrow-17181/repr.py", line 39, in <module>
       print(f'Answer={pc.call_function(function_name, [pa.scalar(10.0), 
pa.array([5.0, 6.0]), pa.scalar(1.0)])}')
     File "pyarrow/_compute.pyx", line 560, in pyarrow._compute.call_function
     File "pyarrow/_compute.pyx", line 355, in pyarrow._compute.Function.call
     File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/_compute.pyx", line 2506, in 
pyarrow._compute._scalar_udf_callback
     File "/home/pace/experiments/arrow-17181/repr.py", line 21, in 
affine_with_python
       m = m[0].as_py()
   TypeError: 'pyarrow.lib.DoubleScalar' object is not subscriptable
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to