Its an excelent Idea!
I got the same problem with the spark client in ts that i made. I cannot make UDF because of this issue. https://github.com/BaldrVivaldelli/ts-spark-connector On Mon, 23 Feb 2026 at 17:58, H Sun <[email protected]> wrote: > Hi Holden, > > Thank you for the reply. > > > I'm curious about how this lines up with > https://github.com/apache/spark/pull/53435/changes > > The proposal should be transparent to this PR, as long as Arrow was used > as the format for data transmission. > > > I'd love to see more zero copy solutions. Perhaps both? > > Surely. During implementation, we should definitely avoid extra copies to > reduce IPC costs in all stages. The PR you mentioned here tries to avoid > copies during arrow to pandas conversion (consuming the arrow batches), > it's in a later stage than the proposed UDF protocol (which receives arrow > batches from Spark). But I agree that we should try to do zero-copies for > both. > > On Mon, Feb 23, 2026 at 9:07 PM Holden Karau <[email protected]> > wrote: > >> I'm curious about how this lines up with >> https://github.com/apache/spark/pull/53435/changes , I'd love to see >> more zero copy solutions and this seems more IPC oriented. Perhaps both? >> >> On Thu, Feb 19, 2026 at 11:42 AM Lisa N. Cao <[email protected]> >> wrote: >> >>> Seems like a good way upfront to avoid lang-specific headaches in the >>> future. >>> >>> -- >>> LNC >>> >>> >>> On Thu, Feb 19, 2026 at 11:03 AM Ángel Álvarez Pascua < >>> [email protected]> wrote: >>> >>>> Hmmm.... sounds like a great idea to me! >>>> >>>> El jue, 19 feb 2026, 19:47, Haiyang Sun via dev <[email protected]> >>>> escribió: >>>> >>>>> Hi all, >>>>> >>>>> I’d like to start a discussion on a draft SPIP: Language-agnostic UDF >>>>> Protocol for Spark >>>>> >>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-55278 >>>>> >>>>> Doc: >>>>> https://docs.google.com/document/d/19Whzq127QxVt2Luk0EClgaDtcpBsFUp67NcVdKKyPF8/edit?tab=t.0 >>>>> >>>>> tl;dr >>>>> >>>>> The SPIP proposes a structured, language-agnostic execution protocol >>>>> for running user-defined functions (UDFs) in Spark across multiple >>>>> programming languages. >>>>> >>>>> Today, Spark Connect allows users to write queries from multiple >>>>> languages, but support for user-defined functions remains incomplete. In >>>>> practice, only Scala / Java / Python / R have working support, and it >>>>> relies on language-specific mechanisms that do not generalize well to >>>>> other >>>>> languages such as Go (Apache Spark Connect Go >>>>> <https://github.com/apache/spark-connect-go>), Rust (Apache Spark >>>>> Connect Rust <https://github.com/apache/spark-connect-rust>), Swift >>>>> (Apache >>>>> Spark Connect Swift <https://github.com/apache/spark-connect-swift>), >>>>> or .NET (Spark Connect DotNet >>>>> <https://github.com/GoEddie/spark-connect-dotnet>), where UDF support >>>>> is currently unavailable. There are also legacy limitations around the >>>>> existing PySpark worker.py implementation that can be improved with the >>>>> proposal. >>>>> >>>>> This proposal aims to define a unified API and execution protocol for >>>>> UDFs that run outside the Spark executor process and communicate with >>>>> Spark >>>>> via inter-process communication (IPC). The goal is to enable Spark to >>>>> interact with external workers in a consistent and extensible way, >>>>> regardless of the implementation language. >>>>> >>>>> I’m happy to help drive the discussion and development of this >>>>> proposal, and I would greatly appreciate feedback from the community. >>>>> >>>>> Thanks, >>>>> >>>>> Haiyang Sun >>>>> >>>> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Fight Health Insurance: https://www.fighthealthinsurance.com/ >> <https://www.fighthealthinsurance.com/?q=hk_email> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> Pronouns: she/her >> >
