[
https://issues.apache.org/jira/browse/ARROW-15635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17490281#comment-17490281
]
Vibhatha Lakmal Abeykoon commented on ARROW-15635:
--------------------------------------------------
[~westonpace] I added more content on the scope of the first iteration for
UDFs.
I clearly don't expect to include all the features for UDFs. But we should
support all the useful cases for quries we are planning to execute. Any missing
pieces?
> [C++][Python] UDF Integration
> ------------------------------
>
> Key: ARROW-15635
> URL: https://issues.apache.org/jira/browse/ARROW-15635
> Project: Apache Arrow
> Issue Type: Task
> Components: C++, Python
> Reporter: Vibhatha Lakmal Abeykoon
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
>
> The objective is to list down a set of tasks required to provide UDF support
> for Apache Arrow streaming execution engine. In the first iteration we will
> be focusing on providing support for Python-based UDFs which can support
> Python functions.
> The UDF Integration is going to pan out with a series of sub-tasks associated
> with the development and PoCs. Note that this is going to be the first
> iteration of UDF integrations with a limited scope. This ticket will cover
> the following topics;
> # POC for UDF integration: The objective is to evaluate the existing
> components in the source and evaluate the required modifications and new
> building blocks required to integrate UDFs.
> # The language will be limited to C+{+}/{+}Python users can register Python
> function as a UDF and use it with an `apply` method on Arrow Tables or
> provide a computation API endpoint via arrow::compute API. Note that the C+
> API already provides a way to register custom functions via the function
> registry API. At the moment this is not exposed to Python.
> # Planned features for this ticket are;
> ## Scalar UDFs : UDFs executed per value (per row)
> ## Vector UDFs : UDFs executed per batch (a full array or partial array)
> ## Aggregate UDFs : UDFs associated with an aggregation operation
> # Integration limitations
> ## Doesn't support custom data types which doesn't support Numpy or Pandas
> ## Complex processing with parallelism within UDFs are not supported
> ## Parallel UDFs are not supported in the initial version of UDFs. Allthough
> we are documenting what is required and a rough sketch for the next phase.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)