To be clear even without Python concerns I don’t think this SPIP is ready for a vote. Let’s move this back to the discussion stage.
Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ <https://www.fighthealthinsurance.com/?q=hk_email> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her On Wed, Feb 25, 2026 at 8:02 PM Wenchen Fan <[email protected]> wrote: > Can we focus on new languages (non-Python and Scala) for this project? > Then the base line is "not supported today" and there is no concern of > performance regression. We can start another vote when we migrate Python > UDF to this new framework, with perf numbers from a working prototype. > > On Thu, Feb 26, 2026 at 6:13 AM Holden Karau <[email protected]> > wrote: > >> To be clear to meet the technical justifcatons they are as follows: >> 1) I believe the security story needs to be fleshed out we're adding a >> new IPC mechanism we should be careful we don't do anything wrong >> 2) Performance assumptions are too vague (e.g. "overhead is minimal" >> without any numbers does not match my experiences or benchmarks) >> 3) The fallback / migration strategy for existing Python users should be >> explicit >> 4) Worker specification is largely missing >> 5) Dependency management is unclear which is understandable for new >> languages but we should at least have a clear story for Python and a gut >> check on if similar things might work in other languages >> 6) Unified query planning for UDFS between languages seems likely to be >> an area of performance regression, different languages have different >> behaviour, we should have some flexibility even at the planning stage >> 7) Inter-UDF pipelining is overlooked >> >> I've left more detailed versions of these comments in the doc. >> >> Broadly speaking I do like this idea, I feel that it's not clear enough >> yetto be adopted. I look forward to a future iteration of this which I can >> vote yes on. >> >> On Wed, Feb 25, 2026 at 1:40 PM Holden Karau <[email protected]> >> wrote: >> >>> -1: I like the idea but I think we need more discussion first. >>> >>> On Wed, Feb 25, 2026 at 1:40 PM DB Tsai <[email protected]> wrote: >>> >>>> +1 >>>> >>>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>>> >>>> On Feb 25, 2026, at 9:33 AM, Daniel Tenedorio <[email protected]> >>>> wrote: >>>> >>>> +1 (non-binding), this should make Spark's interfaces better and >>>> simplify the PySpark UDF protocols. Thanks for preparing this! >>>> >>>> On 2026/02/25 16:12:08 Herman van Hovell via dev wrote: >>>> >>>> Hi Spark devs, >>>> >>>> I would like to call for a vote on the SPIP: Language-Agnostic UDF >>>> Execution Protocol for Spark. >>>> >>>> Summary: >>>> >>>> The SPIP proposes a structured, language-agnostic execution protocol for >>>> running user-defined functions (UDFs) in Spark across multiple >>>> programming >>>> languages. >>>> >>>> Today, Spark Connect allows users to write queries from multiple >>>> languages, >>>> but support for user-defined functions remains incomplete. In practice, >>>> only Scala / Java / Python / R have working support, and it relies on >>>> language-specific mechanisms that do not generalize well to other >>>> languages >>>> such as Go <https://github.com/apache/spark-connect-go>, Rust >>>> <https://github.com/apache/spark-connect-rust>, Swift >>>> <https://github.com/apache/spark-connect-swift>, TypeScript >>>> <https://github.com/BaldrVivaldelli/ts-spark-connector> or .NET >>>> <https://github.com/GoEddie/spark-connect-dotnet>, where UDF support is >>>> currently unavailable. There are also legacy limitations around the >>>> existing PySpark worker.py implementation that can be improved with the >>>> proposal. >>>> >>>> This proposal aims to define a unified API and execution protocol for >>>> UDFs >>>> that run outside the Spark executor process and communicate with Spark >>>> via >>>> inter-process communication (IPC). The goal is to enable Spark to >>>> interact >>>> with external workers in a consistent and extensible way, regardless of >>>> the >>>> implementation language. >>>> >>>> Links: >>>> >>>> SPIP Doc: >>>> >>>> https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0 >>>> >>>> JIRA: https://issues.apache.org/jira/browse/SPARK-55278 >>>> >>>> Discussion Thread: >>>> https://lists.apache.org/thread/9t4svsnd71j7sb4r4scf2xhh8dvp3b43 >>>> >>>> Please vote on the SPIP for the next 72 hours: >>>> >>>> [ ] +1: Accept the proposal as an official SPIP >>>> >>>> [ ] +0 >>>> >>>> [ ] -1: I don’t think this is a good idea because… >>>> >>>> Thanks to everyone who participated in the discussion and provided >>>> valuable >>>> feedback. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: [email protected] >>>> >>>> >>>> >>> >>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>> <https://www.fighthealthinsurance.com/?q=hk_email> >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> Pronouns: she/her >>> >> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Fight Health Insurance: https://www.fighthealthinsurance.com/ >> <https://www.fighthealthinsurance.com/?q=hk_email> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> Pronouns: she/her >> >
