Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

Holden Karau Wed, 24 Jul 2024 12:00:12 -0700

I'm concerned about the term "Classic" bringing a negative connotation to
it.


On Mon, Jul 22, 2024 at 5:11 PM Hyukjin Kwon <[email protected]> wrote:

> Yeah that's what I intended. Thanks for clarification.
>
> Let me start the vote
>
>
> On Tue, 23 Jul 2024 at 08:14, Sadha Chilukoori <[email protected]>
> wrote:
>
>> Hi Dongjoon,
>>
>> *To be clear, is the proposal aiming to make us to say like A instead of
>> B in our documentation?*
>>
>> *A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>> Classic` mode instead.*
>> *B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>> without Spark Connect` mode instead*.
>>
>>
>> Correct, the thread is recommending to use option A, consistently in all
>> the documentation.
>>
>> -Sadha
>>
>> On Mon, Jul 22, 2024, 10:25 AM Dongjoon Hyun <[email protected]> wrote:
>>
>>> Thank you for opening this thread, Hyukjin.
>>>
>>> In this discussion thread, we have three terminologies, (1) ~ (3).
>>>
>>>     > Spark Classic (vs. Spark Connect)
>>>
>>> 1. Spark
>>> 2. Spark Classic (= A proposal for Spark without Spark Connect)
>>> 3. Spark Connect
>>>
>>> As Holden and Jungtaek mentioned,
>>>
>>> - (1) is definitely the existing code base which includes all (including
>>> RDD API, Spark Thrift Server, Spark Connect and so on).
>>>
>>> - (3) is is a very specific use case to a user when a Spark binary
>>> distribution is used with `--remote` option (or enabling the related
>>> features). Like Spark Thrift Server, after query planning steps, there is
>>> no fundamental difference in the execution code side in Spark clusters or
>>> Spark jobs.
>>>
>>> - (2) By the proposed definition, (2) `Spark Classic` is not (1)
>>> `Spark`. Like `--remote`, it's one of runnable modes.
>>>
>>> To be clear, is the proposal aiming to make us to say like A instead of
>>> B in our documentation?
>>>
>>> A. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>>> Classic` mode instead.
>>> B. Since `Spark Connect` mode has no RDD API, we need to use `Spark
>>> without Spark Connect` mode instead.
>>>
>>> Dongjoon.
>>>
>>>
>>>
>>> On 2024/07/22 12:59:54 Sadha Chilukoori wrote:
>>> > +1  (non-binding) for classic.
>>> >
>>> > On Mon, Jul 22, 2024 at 3:59 AM Martin Grund
>>> <[email protected]>
>>> > wrote:
>>> >
>>> > > +1 for classic. It's simple, easy to understand and it doesn't have
>>> the
>>> > > negative meanings like legacy for example.
>>> > >
>>> > > On Sun, Jul 21, 2024 at 23:48 Wenchen Fan <[email protected]>
>>> wrote:
>>> > >
>>> > >> Classic SGTM.
>>> > >>
>>> > >> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim <
>>> > >> [email protected]> wrote:
>>> > >>
>>> > >>> I'd propose not to change the name of "Spark Connect" - the name
>>> > >>> represents the characteristic of the mode (separation of layer for
>>> client
>>> > >>> and server). Trying to remove the part of "Connect" would just make
>>> > >>> confusion.
>>> > >>>
>>> > >>> +1 for Classic to existing mode, till someone comes up with better
>>> > >>> alternatives.
>>> > >>>
>>> > >>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon <[email protected]
>>> >
>>> > >>> wrote:
>>> > >>>
>>> > >>>> I was thinking about a similar option too but I ended up giving
>>> this up
>>> > >>>> .. It's quite unlikely at this moment but suppose that we have
>>> another
>>> > >>>> Spark Connect-ish component in the far future and it would be
>>> challenging
>>> > >>>> to come up with another name ... Another case is that we might
>>> have to cope
>>> > >>>> with the cases like Spark Connect, vs Spark (with Spark Connect)
>>> and Spark
>>> > >>>> (without Spark Connect) ..
>>> > >>>>
>>> > >>>> On Sun, 21 Jul 2024 at 09:59, Holden Karau <
>>> [email protected]>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>>> I think perhaps Spark Connect could be phrased as “Basic* Spark”
>>> &
>>> > >>>>> existing Spark could be “Full Spark” given the API limitations
>>> of Spark
>>> > >>>>> connect.
>>> > >>>>>
>>> > >>>>> *I was also thinking Core here but we’ve used core to refer to
>>> the RDD
>>> > >>>>> APIs for too long to reuse it here.
>>> > >>>>>
>>> > >>>>> Twitter: https://twitter.com/holdenkarau
>>> > >>>>> Books (Learning Spark, High Performance Spark, etc.):
>>> > >>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> > >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> > >>>>>
>>> > >>>>>
>>> > >>>>> On Sat, Jul 20, 2024 at 8:02 PM Xiao Li <[email protected]>
>>> wrote:
>>> > >>>>>
>>> > >>>>>> Classic is much better than Legacy. : )
>>> > >>>>>>
>>> > >>>>>> Hyukjin Kwon <[email protected]> 于2024年7月18日周四 16:58写道：
>>> > >>>>>>
>>> > >>>>>>> Hi all,
>>> > >>>>>>>
>>> > >>>>>>> I noticed that we need to standardize our terminology before
>>> moving
>>> > >>>>>>> forward. For instance, when documenting, 'Spark without Spark
>>> Connect' is
>>> > >>>>>>> too long and verbose. Additionally, I've observed that we use
>>> various names
>>> > >>>>>>> for Spark without Spark Connect: Spark Classic, Classic Spark,
>>> Legacy
>>> > >>>>>>> Spark, etc.
>>> > >>>>>>>
>>> > >>>>>>> I propose that we consistently refer to it as Spark Classic
>>> (vs.
>>> > >>>>>>> Spark Connect).
>>> > >>>>>>>
>>> > >>>>>>> Please share your thoughts on this. Thanks!
>>> > >>>>>>>
>>> > >>>>>>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

Reply via email to