Hi Spark devs, I would like to call for *a new vote following the previous attempt* for the *SPIP: Language-Agnostic UDF Execution Protocol for Spark *after addressing comments and providing a supplementary design document for worker specification.
The SPIP proposes a structured, language-agnostic framework for running user-defined functions (UDFs) in Spark across multiple programming languages Today, Spark Connect allows users to write queries from multiple languages, but support for user-defined functions remains incomplete. In practice, only Scala, Java, Python have working support, and this relies on language-specific mechanisms that do not generalize well to other languages such as Go <https://github.com/apache/spark-connect-go> / Rust <https://github.com/apache/spark-connect-rust> / Swift <https://github.com/apache/spark-connect-swift> / TypeScript <https://github.com/BaldrVivaldelli/ts-spark-connector> where UDF support is currently unavailable. In addition, there are legacy limitations in the existing PySpark worker implementation that make it difficult to evolve the system or extend it to new languages. The proposal introduces two related components: 1. *A unified UDF execution protocol* The proposal defines a structured API and execution protocol for running UDFs outside the Spark executor process and communicating with Spark via inter-process communication (IPC). This protocol enables Spark to interact with external UDF workers in a consistent and extensible way, regardless of the implementation language. 2. *A worker specification for provisioning and lifecycle management.* To support multi-language execution environments, the proposal also introduces a worker specification describing how UDF workers can be installed, started, connected to, and terminated. This document complements the SPIP by outlining how workers can be provisioned and managed in a consistent way. Note that this SPIP can help enable UDF support for languages that currently do not support UDFs. For languages that already have UDF implementations (especially Python), the goal is not to replace existing implementations immediately, but to provide a framework that may allow them to gradually evolve toward more language-agnostic abstractions over time. More details can be found in the SPIP document and the supplementary design for worker specification: SPIP: https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8 Worker specification design document: https://docs.google.com/document/d/1Dx9NqHRNuUpatH9DYoFF9cmvUl2fqHT4Rjbyw4EGLHs Discussion Thread: https://lists.apache.org/thread/9t4svsnd71j7sb4r4scf2xhh8dvp3b43 Previous vote and discussion thread: https://lists.apache.org/thread/81xghrfwvopp274rgyxfthsstb2xmkz1 *Please vote on adopting this proposal.* [ ] +1: Accept the proposal as an official SPIP [ ] +0: No opinion [ ] -1: Disapprove (please explain why) The vote will remain open for *at least 72 hours. * Thanks to everyone who participated in the discussion and provided valuable feedback! Best regards, Haiyang
