Let me take a look. shouldn't be a major issue. On Wed, 22 Jan 2025 at 08:31, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> As discussed on a thread over the weekend, we agreed among us including > Matei on a shift towards a more stable and version-independent APIs. > Spark Connect IMO is a key enabler of this shift, allowing users and > developers to build applications and libraries that are more resilient to > changes in Spark's internals as opposed to RDDs. *Moreover, **maintaining > backward compatibility fo*r the existing *RDD-based applications and > libraries* is crucial during this transition window so the timeframe is > another factor for consideration. > > HTH > > Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > On Tue, 21 Jan 2025 at 22:40, Holden Karau <holden.ka...@gmail.com> wrote: > >> Interesting. So given one of the features of Spark connect should be >> simpler migrations we should (in my mind) only declare it stable once we’ve >> gone through two releases where the previous client + its code can talk to >> the new server. >> >> Twitter: https://twitter.com/holdenkarau >> Fight Health Insurance: https://www.fighthealthinsurance.com/ >> <https://www.fighthealthinsurance.com/?q=hk_email> >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> Pronouns: she/her >> >> >> On Tue, Jan 21, 2025 at 12:31 PM Dongjoon Hyun <dongj...@apache.org> >> wrote: >> >>> It seems that there is misinformation about the stability of Spark >>> Connect in Spark 4. I would like to reduce the gap in our dev mailing list. >>> >>> Frequently, some people claim `Spark Connect` is stable because it uses >>> Protobuf. Yes, we standardize the interface layer. However, may I ask if it >>> implies its implementation's stability? >>> >>> Since Apache Spark is an open source community, you can see the >>> stability of implementation in our public CI. In our CI, the PySpark >>> Connect client has been technically broken most of the time. >>> >>> 1. >>> https://github.com/apache/spark/actions/workflows/build_python_connect.yml >>> (Spark Connect Python-only in master) >>> >>> In addition, the Spark 3.5 client seems to face another difficulty >>> talking with Spark 4 server. >>> >>> 2. >>> https://github.com/apache/spark/actions/workflows/build_python_connect35.yml >>> (Spark Connect Python-only:master-server, 35-client) >>> >>> 3. What about the stability and the feature parities in different >>> languages? Do they work well with Apache Spark 4? I'm wondering if there is >>> any clue for the Apache Spark community to do assessment? >>> >>> Given (1), (2), and (3), how can we make sure that `Spark Connect` is >>> stable or ready in Spark 4? From my perspective, this is still actively >>> under development with an open end. >>> >>> The bottom line is `Spark Connect` needs more community love in order to >>> be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the >>> healthy Spark Connect CI in Spark 4. Until then, let's clarify what is >>> stable in `Spark Connect` and what is not yet. >>> >>> Best Regards, >>> Dongjoon. >>> >>> PS. >>> This is a seperate thread from the previous flakiness issues. >>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq >>> ([FYI] Known `Spark Connect` Test Suite Flakiness) >>> >>