Hi Spark devs,

I would like to call for a vote on the SPIP: Language-Agnostic UDF
Execution Protocol for Spark.

Summary:

The SPIP proposes a structured, language-agnostic execution protocol for
running user-defined functions (UDFs) in Spark across multiple programming
languages.

Today, Spark Connect allows users to write queries from multiple languages,
but support for user-defined functions remains incomplete. In practice,
only Scala / Java / Python / R have working support, and it relies on
language-specific mechanisms that do not generalize well to other languages
such as Go <https://github.com/apache/spark-connect-go>, Rust
<https://github.com/apache/spark-connect-rust>, Swift
<https://github.com/apache/spark-connect-swift>, TypeScript
<https://github.com/BaldrVivaldelli/ts-spark-connector> or .NET
<https://github.com/GoEddie/spark-connect-dotnet>, where UDF support is
currently unavailable. There are also legacy limitations around the
existing PySpark worker.py implementation that can be improved with the
proposal.

This proposal aims to define a unified API and execution protocol for UDFs
that run outside the Spark executor process and communicate with Spark via
inter-process communication (IPC). The goal is to enable Spark to interact
with external workers in a consistent and extensible way, regardless of the
implementation language.

Links:

SPIP Doc:
https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0

JIRA: https://issues.apache.org/jira/browse/SPARK-55278

Discussion Thread:
https://lists.apache.org/thread/9t4svsnd71j7sb4r4scf2xhh8dvp3b43

Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP

[ ] +0

[ ] -1: I don’t think this is a good idea because…

Thanks to everyone who participated in the discussion and provided valuable
feedback.

Reply via email to