Re: SPARK-55278 - SPIP: Language-agnostic UDF Protocol for Spark

H Sun Mon, 23 Feb 2026 12:58:58 -0800

Hi Holden,

Thank you for the reply.


> I'm curious about how this lines up with
https://github.com/apache/spark/pull/53435/changes

The proposal should be transparent to this PR, as long as Arrow was used as
the format for data transmission.

> I'd love to see more zero copy solutions. Perhaps both?

Surely. During implementation, we should definitely avoid extra copies to
reduce IPC costs in all stages. The PR you mentioned here tries to avoid
copies during arrow to pandas conversion (consuming the arrow batches),
it's in a later stage than the proposed UDF protocol (which receives arrow
batches from Spark). But I agree that we should try to do zero-copies for
both.

On Mon, Feb 23, 2026 at 9:07 PM Holden Karau <[email protected]> wrote:

> I'm curious about how this lines up with
> https://github.com/apache/spark/pull/53435/changes , I'd love to see more
> zero copy solutions and this seems more IPC oriented. Perhaps both?
>
> On Thu, Feb 19, 2026 at 11:42 AM Lisa N. Cao <[email protected]>
> wrote:
>
>> Seems like a good way upfront to avoid lang-specific headaches in the
>> future.
>>
>> --
>> LNC
>>
>>
>> On Thu, Feb 19, 2026 at 11:03 AM Ángel Álvarez Pascua <
>> [email protected]> wrote:
>>
>>> Hmmm.... sounds like a great idea to me!
>>>
>>> El jue, 19 feb 2026, 19:47, Haiyang Sun via dev <[email protected]>
>>> escribió:
>>>
>>>> Hi all,
>>>>
>>>> I’d like to start a discussion on a draft SPIP: Language-agnostic UDF
>>>> Protocol for Spark
>>>>
>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-55278
>>>>
>>>> Doc:
>>>> https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0
>>>>
>>>> tl;dr
>>>>
>>>> The SPIP proposes a structured, language-agnostic execution protocol
>>>> for running user-defined functions (UDFs) in Spark across multiple
>>>> programming languages.
>>>>
>>>> Today, Spark Connect allows users to write queries from multiple
>>>> languages, but support for user-defined functions remains incomplete. In
>>>> practice, only Scala / Java / Python / R have working support, and it
>>>> relies on language-specific mechanisms that do not generalize well to other
>>>> languages such as Go (Apache Spark Connect Go
>>>> <https://github.com/apache/spark-connect-go>), Rust (Apache Spark
>>>> Connect Rust <https://github.com/apache/spark-connect-rust>), Swift (Apache
>>>> Spark Connect Swift <https://github.com/apache/spark-connect-swift>),
>>>> or .NET (Spark Connect DotNet
>>>> <https://github.com/GoEddie/spark-connect-dotnet>), where UDF support
>>>> is currently unavailable. There are also legacy limitations around the
>>>> existing PySpark worker.py implementation that can be improved with the
>>>> proposal.
>>>>
>>>> This proposal aims to define a unified API and execution protocol for
>>>> UDFs that run outside the Spark executor process and communicate with Spark
>>>> via inter-process communication (IPC). The goal is to enable Spark to
>>>> interact with external workers in a consistent and extensible way,
>>>> regardless of the implementation language.
>>>>
>>>> I’m happy to help drive the discussion and development of this
>>>> proposal, and I would greatly appreciate feedback from the community.
>>>>
>>>> Thanks,
>>>>
>>>> Haiyang Sun
>>>>
>>>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://www.fighthealthinsurance.com/?q=hk_email>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>

Re: SPARK-55278 - SPIP: Language-agnostic UDF Protocol for Spark

Reply via email to