Its an excelent Idea!

I got the same problem with the spark client in ts that i made. I cannot
make UDF because of this issue.

https://github.com/BaldrVivaldelli/ts-spark-connector

On Mon, 23 Feb 2026 at 17:58, H Sun <[email protected]> wrote:

> Hi Holden,
>
> Thank you for the reply.
>
> > I'm curious about how this lines up with
> https://github.com/apache/spark/pull/53435/changes
>
> The proposal should be transparent to this PR, as long as Arrow was used
> as the format for data transmission.
>
> > I'd love to see more zero copy solutions. Perhaps both?
>
> Surely. During implementation, we should definitely avoid extra copies to
> reduce IPC costs in all stages. The PR you mentioned here tries to avoid
> copies during arrow to pandas conversion (consuming the arrow batches),
> it's in a later stage than the proposed UDF protocol (which receives arrow
> batches from Spark). But I agree that we should try to do zero-copies for
> both.
>
> On Mon, Feb 23, 2026 at 9:07 PM Holden Karau <[email protected]>
> wrote:
>
>> I'm curious about how this lines up with
>> https://github.com/apache/spark/pull/53435/changes , I'd love to see
>> more zero copy solutions and this seems more IPC oriented. Perhaps both?
>>
>> On Thu, Feb 19, 2026 at 11:42 AM Lisa N. Cao <[email protected]>
>> wrote:
>>
>>> Seems like a good way upfront to avoid lang-specific headaches in the
>>> future.
>>>
>>> --
>>> LNC
>>>
>>>
>>> On Thu, Feb 19, 2026 at 11:03 AM Ángel Álvarez Pascua <
>>> [email protected]> wrote:
>>>
>>>> Hmmm.... sounds like a great idea to me!
>>>>
>>>> El jue, 19 feb 2026, 19:47, Haiyang Sun via dev <[email protected]>
>>>> escribió:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I’d like to start a discussion on a draft SPIP: Language-agnostic UDF
>>>>> Protocol for Spark
>>>>>
>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-55278
>>>>>
>>>>> Doc:
>>>>> https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0
>>>>>
>>>>> tl;dr
>>>>>
>>>>> The SPIP proposes a structured, language-agnostic execution protocol
>>>>> for running user-defined functions (UDFs) in Spark across multiple
>>>>> programming languages.
>>>>>
>>>>> Today, Spark Connect allows users to write queries from multiple
>>>>> languages, but support for user-defined functions remains incomplete. In
>>>>> practice, only Scala / Java / Python / R have working support, and it
>>>>> relies on language-specific mechanisms that do not generalize well to 
>>>>> other
>>>>> languages such as Go (Apache Spark Connect Go
>>>>> <https://github.com/apache/spark-connect-go>), Rust (Apache Spark
>>>>> Connect Rust <https://github.com/apache/spark-connect-rust>), Swift 
>>>>> (Apache
>>>>> Spark Connect Swift <https://github.com/apache/spark-connect-swift>),
>>>>> or .NET (Spark Connect DotNet
>>>>> <https://github.com/GoEddie/spark-connect-dotnet>), where UDF support
>>>>> is currently unavailable. There are also legacy limitations around the
>>>>> existing PySpark worker.py implementation that can be improved with the
>>>>> proposal.
>>>>>
>>>>> This proposal aims to define a unified API and execution protocol for
>>>>> UDFs that run outside the Spark executor process and communicate with 
>>>>> Spark
>>>>> via inter-process communication (IPC). The goal is to enable Spark to
>>>>> interact with external workers in a consistent and extensible way,
>>>>> regardless of the implementation language.
>>>>>
>>>>> I’m happy to help drive the discussion and development of this
>>>>> proposal, and I would greatly appreciate feedback from the community.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Haiyang Sun
>>>>>
>>>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>

Reply via email to