Haiyang Sun created SPARK-55278:
-----------------------------------
Summary: Language-agnostic UDF Protocol for Spark
Key: SPARK-55278
URL: https://issues.apache.org/jira/browse/SPARK-55278
Project: Spark
Issue Type: Improvement
Components: Connect, PySpark
Affects Versions: 4.2
Reporter: Haiyang Sun
Run user-provided code in Spark {*}consistently across many programming
languages{*}.
Today, Spark Connect allows users to write queries from multiple languages, but
support for user-defined functions is incomplete. In practice, only Python has
a mature solution, and it relies on language-specific mechanisms that do not
generalize to other languages such as
[Go|https://github.com/apache/spark-connect-go] /
[Rust|https://github.com/apache/spark-connect-rust] /
[Swift|https://github.com/apache/spark-connect-swift] /
[.NET|https://github.com/GoEddie/spark-connect-dotnet] (where UDF is not
supported).
Our objective is to define a *unified API and execution protocol* for
user-defined functions that run outside the Spark engine process via
inter-process communication (IPC). This allows Spark to interact with external
workers in a consistent way, regardless of the language used to implement the
function.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]