Thank you, Hyukjin!

Dongjoon

On Tue, Jan 21, 2025 at 16:10 Hyukjin Kwon <gurwls...@apache.org> wrote:

> Just a quick note on that: the major reason is 1. OOM we should figure out
> and fix the CI environment. 2. structured streaming test failure that is
> still in development.
> I made an umbrella JIRA (https://issues.apache.org/jira/browse/SPARK-50907),
> and I will work there. Should be easier to look at what was the actual
> issue there.
>
> On Wed, 22 Jan 2025 at 09:04, Hyukjin Kwon <gurwls...@apache.org> wrote:
>
>> Let me take a look. shouldn't be a major issue.
>>
>> On Wed, 22 Jan 2025 at 08:31, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> As discussed on a thread over the weekend, we agreed among us including
>>> Matei on  a shift towards a  more stable and version-independent APIs.
>>> Spark Connect IMO is a key enabler of this shift, allowing users and
>>> developers to build applications and libraries that are more resilient to
>>> changes in Spark's internals as opposed to RDDs. *Moreover, **maintaining
>>> backward compatibility fo*r the existing *RDD-based applications and
>>> libraries* is crucial during this transition window so the timeframe is
>>> another factor for consideration.
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, 21 Jan 2025 at 22:40, Holden Karau <holden.ka...@gmail.com>
>>> wrote:
>>>
>>>> Interesting. So given one of the features of Spark connect should be
>>>> simpler migrations we should (in my mind) only declare it stable once we’ve
>>>> gone through two releases where the previous client + its code can talk to
>>>> the new server.
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> Pronouns: she/her
>>>>
>>>>
>>>> On Tue, Jan 21, 2025 at 12:31 PM Dongjoon Hyun <dongj...@apache.org>
>>>> wrote:
>>>>
>>>>> It seems that there is misinformation about the stability of Spark
>>>>> Connect in Spark 4. I would like to reduce the gap in our dev mailing 
>>>>> list.
>>>>>
>>>>> Frequently, some people claim `Spark Connect` is stable because it
>>>>> uses Protobuf. Yes, we standardize the interface layer. However, may I ask
>>>>> if it implies its implementation's stability?
>>>>>
>>>>> Since Apache Spark is an open source community, you can see the
>>>>> stability of implementation in our public CI. In our CI, the PySpark
>>>>> Connect client has been technically broken most of the time.
>>>>>
>>>>> 1.
>>>>> https://github.com/apache/spark/actions/workflows/build_python_connect.yml
>>>>> (Spark Connect Python-only in master)
>>>>>
>>>>> In addition, the Spark 3.5 client seems to face another difficulty
>>>>> talking with Spark 4 server.
>>>>>
>>>>> 2.
>>>>> https://github.com/apache/spark/actions/workflows/build_python_connect35.yml
>>>>> (Spark Connect Python-only:master-server, 35-client)
>>>>>
>>>>> 3. What about the stability and the feature parities in different
>>>>> languages? Do they work well with Apache Spark 4? I'm wondering if there 
>>>>> is
>>>>> any clue for the Apache Spark community to do assessment?
>>>>>
>>>>> Given (1), (2), and (3), how can we make sure that `Spark Connect` is
>>>>> stable or ready in Spark 4? From my perspective, this is still actively
>>>>> under development with an open end.
>>>>>
>>>>> The bottom line is `Spark Connect` needs more community love in order
>>>>> to be claimed as Stable in Apache Spark 4. I'm looking forward to seeing
>>>>> the healthy Spark Connect CI in Spark 4. Until then, let's clarify what is
>>>>> stable in `Spark Connect` and what is not yet.
>>>>>
>>>>> Best Regards,
>>>>> Dongjoon.
>>>>>
>>>>> PS.
>>>>> This is a seperate thread from the previous flakiness issues.
>>>>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq
>>>>> ([FYI] Known `Spark Connect` Test Suite Flakiness)
>>>>>
>>>>

Reply via email to