Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

Dongjoon Hyun Mon, 22 Jul 2024 10:26:52 -0700

Thank you for opening this thread, Hyukjin.

In this discussion thread, we have three terminologies, (1) ~ (3).


    > Spark Classic (vs. Spark Connect)

1. Spark
2. Spark Classic (= A proposal for Spark without Spark Connect)
3. Spark Connect

As Holden and Jungtaek mentioned, 

- (1) is definitely the existing code base which includes all (including RDD 
API, Spark Thrift Server, Spark Connect and so on). 

- (3) is is a very specific use case to a user when a Spark binary distribution 
is used with `--remote` option (or enabling the related features). Like Spark 
Thrift Server, after query planning steps, there is no fundamental difference 
in the execution code side in Spark clusters or Spark jobs.

- (2) By the proposed definition, (2) `Spark Classic` is not (1) `Spark`. Like 
`--remote`, it's one of runnable modes.

To be clear, is the proposal aiming to make us to say like A instead of B in 
our documentation?

A. Since `Spark Connect` mode has no RDD API, we need to use `Spark Classic` 
mode instead.
B. Since `Spark Connect` mode has no RDD API, we need to use `Spark without 
Spark Connect` mode instead.

Dongjoon.



On 2024/07/22 12:59:54 Sadha Chilukoori wrote:
> +1  (non-binding) for classic.
> 
> On Mon, Jul 22, 2024 at 3:59 AM Martin Grund <[email protected]>
> wrote:
> 
> > +1 for classic. It's simple, easy to understand and it doesn't have the
> > negative meanings like legacy for example.
> >
> > On Sun, Jul 21, 2024 at 23:48 Wenchen Fan <[email protected]> wrote:
> >
> >> Classic SGTM.
> >>
> >> On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim <
> >> [email protected]> wrote:
> >>
> >>> I'd propose not to change the name of "Spark Connect" - the name
> >>> represents the characteristic of the mode (separation of layer for client
> >>> and server). Trying to remove the part of "Connect" would just make
> >>> confusion.
> >>>
> >>> +1 for Classic to existing mode, till someone comes up with better
> >>> alternatives.
> >>>
> >>> On Mon, Jul 22, 2024 at 8:50 AM Hyukjin Kwon <[email protected]>
> >>> wrote:
> >>>
> >>>> I was thinking about a similar option too but I ended up giving this up
> >>>> .. It's quite unlikely at this moment but suppose that we have another
> >>>> Spark Connect-ish component in the far future and it would be challenging
> >>>> to come up with another name ... Another case is that we might have to 
> >>>> cope
> >>>> with the cases like Spark Connect, vs Spark (with Spark Connect) and 
> >>>> Spark
> >>>> (without Spark Connect) ..
> >>>>
> >>>> On Sun, 21 Jul 2024 at 09:59, Holden Karau <[email protected]>
> >>>> wrote:
> >>>>
> >>>>> I think perhaps Spark Connect could be phrased as “Basic* Spark” &
> >>>>> existing Spark could be “Full Spark” given the API limitations of Spark
> >>>>> connect.
> >>>>>
> >>>>> *I was also thinking Core here but we’ve used core to refer to the RDD
> >>>>> APIs for too long to reuse it here.
> >>>>>
> >>>>> Twitter: https://twitter.com/holdenkarau
> >>>>> Books (Learning Spark, High Performance Spark, etc.):
> >>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> >>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >>>>>
> >>>>>
> >>>>> On Sat, Jul 20, 2024 at 8:02 PM Xiao Li <[email protected]> wrote:
> >>>>>
> >>>>>> Classic is much better than Legacy. : )
> >>>>>>
> >>>>>> Hyukjin Kwon <[email protected]> 于2024年7月18日周四 16:58写道：
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I noticed that we need to standardize our terminology before moving
> >>>>>>> forward. For instance, when documenting, 'Spark without Spark 
> >>>>>>> Connect' is
> >>>>>>> too long and verbose. Additionally, I've observed that we use various 
> >>>>>>> names
> >>>>>>> for Spark without Spark Connect: Spark Classic, Classic Spark, Legacy
> >>>>>>> Spark, etc.
> >>>>>>>
> >>>>>>> I propose that we consistently refer to it as Spark Classic (vs.
> >>>>>>> Spark Connect).
> >>>>>>>
> >>>>>>> Please share your thoughts on this. Thanks!
> >>>>>>>
> >>>>>>
> 

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

Reply via email to