Joris Van den Bossche created ARROW-17827:
---------------------------------------------
Summary: [Python] Allow calling UDF kernels with field/scalar
expressions
Key: ARROW-17827
URL: https://issues.apache.org/jira/browse/ARROW-17827
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Joris Van den Bossche
>From https://github.com/apache/arrow/pull/13687#issuecomment-1240399112, where
>it came up while adding documentation on how to use UDFs in Python. When just
>wanting to invoke a UDF with arrays, you can do {{pc.call_function("my_udf",
>[pc.field("a")])}}.
But if you want to use your UDF in a context that needs an expression (eg a
dataset projection), you need to be able to call the UDF with expressions as
argument. And currently, the {{pc.call_function}} doesn't work that way (it
expects actual, materialized arrays/scalars as arguments). As a workaround, you
can use the private {{Expression._call}}:
{code:python}
# doesn't work with expressions
>>> pc.call_function("my_udf", [pc.field("col")])
...
TypeError: Got unexpected argument type <class 'pyarrow._compute.Expression'>
for compute function
# workaround
>>> pc.Expression._call("my_udf", [pc.field("col")])
<pyarrow.compute.Expression my_udf(col)>
{code}
So we should try to improve the usability here. Some options:
* See if we can change {{pc.call_function}} to also accept Expressions as
arguments
* Make the {{_call}} public, so one can do {{pc.Expression.call("my_udf",
[..])}}
cc [~westonpace] [~vibhatha]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)