To be clear even without Python concerns I don’t think this SPIP is ready
for a vote. Let’s move this back to the discussion stage.


Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her


On Wed, Feb 25, 2026 at 8:02 PM Wenchen Fan <[email protected]> wrote:

> Can we focus on new languages (non-Python and Scala) for this project?
> Then the base line is "not supported today" and there is no concern of
> performance regression. We can start another vote when we migrate Python
> UDF to this new framework, with perf numbers from a working prototype.
>
> On Thu, Feb 26, 2026 at 6:13 AM Holden Karau <[email protected]>
> wrote:
>
>> To be clear to meet the technical justifcatons they are as follows:
>> 1) I believe the security story needs to be fleshed out we're adding a
>> new IPC mechanism we should be careful we don't do anything wrong
>> 2) Performance assumptions are too vague (e.g. "overhead is minimal"
>> without any numbers does not match my experiences or benchmarks)
>> 3) The fallback / migration strategy for existing Python users should be
>> explicit
>> 4) Worker specification is largely missing
>> 5) Dependency management is unclear which is understandable for new
>> languages but we should at least have a clear story for Python and a gut
>> check on if similar things might work in other languages
>> 6) Unified query planning for UDFS between languages seems likely to be
>> an area of performance regression, different languages have different
>> behaviour, we should have some flexibility even at the planning stage
>> 7) Inter-UDF pipelining is overlooked
>>
>> I've left more detailed versions of these comments in the doc.
>>
>> Broadly speaking I do like this idea, I feel that it's not clear enough
>> yetto be adopted. I look forward to a future iteration of this which I can
>> vote yes on.
>>
>> On Wed, Feb 25, 2026 at 1:40 PM Holden Karau <[email protected]>
>> wrote:
>>
>>> -1: I like the idea but I think we need more discussion first.
>>>
>>> On Wed, Feb 25, 2026 at 1:40 PM DB Tsai <[email protected]> wrote:
>>>
>>>> +1
>>>>
>>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>>
>>>> On Feb 25, 2026, at 9:33 AM, Daniel Tenedorio <[email protected]>
>>>> wrote:
>>>>
>>>> +1 (non-binding), this should make Spark's interfaces better and
>>>> simplify the PySpark UDF protocols. Thanks for preparing this!
>>>>
>>>> On 2026/02/25 16:12:08 Herman van Hovell via dev wrote:
>>>>
>>>> Hi Spark devs,
>>>>
>>>> I would like to call for a vote on the SPIP: Language-Agnostic UDF
>>>> Execution Protocol for Spark.
>>>>
>>>> Summary:
>>>>
>>>> The SPIP proposes a structured, language-agnostic execution protocol for
>>>> running user-defined functions (UDFs) in Spark across multiple
>>>> programming
>>>> languages.
>>>>
>>>> Today, Spark Connect allows users to write queries from multiple
>>>> languages,
>>>> but support for user-defined functions remains incomplete. In practice,
>>>> only Scala / Java / Python / R have working support, and it relies on
>>>> language-specific mechanisms that do not generalize well to other
>>>> languages
>>>> such as Go <https://github.com/apache/spark-connect-go>, Rust
>>>> <https://github.com/apache/spark-connect-rust>, Swift
>>>> <https://github.com/apache/spark-connect-swift>, TypeScript
>>>> <https://github.com/BaldrVivaldelli/ts-spark-connector> or .NET
>>>> <https://github.com/GoEddie/spark-connect-dotnet>, where UDF support is
>>>> currently unavailable. There are also legacy limitations around the
>>>> existing PySpark worker.py implementation that can be improved with the
>>>> proposal.
>>>>
>>>> This proposal aims to define a unified API and execution protocol for
>>>> UDFs
>>>> that run outside the Spark executor process and communicate with Spark
>>>> via
>>>> inter-process communication (IPC). The goal is to enable Spark to
>>>> interact
>>>> with external workers in a consistent and extensible way, regardless of
>>>> the
>>>> implementation language.
>>>>
>>>> Links:
>>>>
>>>> SPIP Doc:
>>>>
>>>> https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0
>>>>
>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-55278
>>>>
>>>> Discussion Thread:
>>>> https://lists.apache.org/thread/9t4svsnd71j7sb4r4scf2xhh8dvp3b43
>>>>
>>>> Please vote on the SPIP for the next 72 hours:
>>>>
>>>> [ ] +1: Accept the proposal as an official SPIP
>>>>
>>>> [ ] +0
>>>>
>>>> [ ] -1: I don’t think this is a good idea because…
>>>>
>>>> Thanks to everyone who participated in the discussion and provided
>>>> valuable
>>>> feedback.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: [email protected]
>>>>
>>>>
>>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>

Reply via email to