westonpace commented on PR #12590:
URL: https://github.com/apache/arrow/pull/12590#issuecomment-1109543263
> @westonpace agreed.
> Although, internally I think we still need to wrap array or scalar
properly right? Because the function call in Python expects an array or scalar.
> The Execbatch could be an array or scalar of type defined by the user.
Please correct me if I am wrong.
Yes. My suggestion would not change the input types to the functions
themselves. So in my example:
```
pc.register_scalar_function(some_function, ... { "left": pa.int64(),
"right": pa.int64() }, pa.int64())
```
The function `some_function` should expect `left` to be `pa.Int64Array` or
`pa.Int64Scalar` (and the same for `right`). The arguments do not have to
agree (`left` could be `pa.Int64Array` and `right` could be `pa.Int64Scalar`).
The return type should either be `pa.Int64Array` or `pa.Int64Scalar`
This means making a "robust" UDF is actually rather tricky. However, a UDF
that only handles array inputs and returns an array will work in most
situations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]