Hi Spark devs, I would like to call for a vote on the SPIP: Language-Agnostic UDF Execution Protocol for Spark.
Summary: The SPIP proposes a structured, language-agnostic execution protocol for running user-defined functions (UDFs) in Spark across multiple programming languages. Today, Spark Connect allows users to write queries from multiple languages, but support for user-defined functions remains incomplete. In practice, only Scala / Java / Python / R have working support, and it relies on language-specific mechanisms that do not generalize well to other languages such as Go <https://github.com/apache/spark-connect-go>, Rust <https://github.com/apache/spark-connect-rust>, Swift <https://github.com/apache/spark-connect-swift>, TypeScript <https://github.com/BaldrVivaldelli/ts-spark-connector> or .NET <https://github.com/GoEddie/spark-connect-dotnet>, where UDF support is currently unavailable. There are also legacy limitations around the existing PySpark worker.py implementation that can be improved with the proposal. This proposal aims to define a unified API and execution protocol for UDFs that run outside the Spark executor process and communicate with Spark via inter-process communication (IPC). The goal is to enable Spark to interact with external workers in a consistent and extensible way, regardless of the implementation language. Links: SPIP Doc: https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0 JIRA: https://issues.apache.org/jira/browse/SPARK-55278 Discussion Thread: https://lists.apache.org/thread/9t4svsnd71j7sb4r4scf2xhh8dvp3b43 Please vote on the SPIP for the next 72 hours: [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because… Thanks to everyone who participated in the discussion and provided valuable feedback.
