For the RDD part, I also disagree with Martin. I believe RDD should be supported permanently as the public API. Otherwise, it would be a surprise to me and my colleagues at least.
> I would assume that we all agree that > 99% of the _new_ users in Spark should not try to write code in RDDs. According to this long discussion context, I also decided to switch my vote from +1 to -1 because it seems too early to make this decision given the pending `Spark Connect` work and active discussion. Previously, I was biased only on the SQL part too much. As a side note, I hope Apache Spark 4.0.0 release is not going to be blocked by the `Spark Connect` pending work and decision. Dongjoon. On Tue, Dec 3, 2024 at 7:51 PM Holden Karau <holden.ka...@gmail.com> wrote: > > > Twitter: https://twitter.com/holdenkarau > Fight Health Insurance: https://www.fighthealthinsurance.com/ > <https://www.fighthealthinsurance.com/?q=hk_email> > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > Pronouns: she/her > > > On Fri, Nov 29, 2024 at 12:24 AM Martin Grund <mar...@databricks.com> > wrote: > >> At the chance of repeating what Herman said word by word :) I would like >> to call out the following: >> >> 1. The goal of setting the default is to guide users to use the Spark >> SQL APIs that have proven over time. We shouldn't underestimate the power >> of the default. I would assume that we all agree that 99% of the _new_ >> users in Spark should not try to write code in RDDs. >> >> I would disagree here. Maybe like 75% > >> >> 1. >> 2. Any user, organization, or vendor can leverage *all* of their >> existing code by simply changing *one* configuration during startup: >> switching the spark.api.mode to classic (e.g., similar to ANSI mode). This >> means all existing RDD and library code just works fine. >> >> 3. Creating a fractured user experience by using some logic to >> identify which API mode is used is not ideal. For many of the use cases >> that I've seen that require additional jars (e.g., data sources, drivers), >> they just work fine because Spark already has the right abstractions. For >> JARs used in the client side part of the code they just work as Herman >> said. >> >> Introducing the config flag defaulting to a limited API already > introduces a fractured user experience where an application may fail part > way through running. > >> >> 1. >> >> Similarly based on the experience of running Spark Connect in production, >> the co-existence of workloads running in classic mode and connect mode is >> working fine. >> >> > I still don’t like classic mode (maybe “full” and “restricted”). > >> >> >> On Fri, Nov 29, 2024 at 3:18 AM Holden Karau <holden.ka...@gmail.com> >> wrote: >> >>> I would switch to +0 if the default of connect was only for apps without >>> any user provided jars/non-JVM apps. >>> >>> Twitter: https://twitter.com/holdenkarau >>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>> <https://www.fighthealthinsurance.com/?q=hk_email> >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> Pronouns: she/her >>> >>> >>> On Thu, Nov 28, 2024 at 6:11 PM Holden Karau <holden.ka...@gmail.com> >>> wrote: >>> >>>> Given there is no plan to support RDDs I’ll update to -0.9 >>>> >>>> >>>> Twitter: https://twitter.com/holdenkarau >>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> Pronouns: she/her >>>> >>>> >>>> On Thu, Nov 28, 2024 at 6:00 PM Herman van Hovell < >>>> her...@databricks.com> wrote: >>>> >>>>> Hi Holden and Mridul, >>>>> >>>>> Just to be clear. What API parity are you expecting here? We have >>>>> parity for everything that is exposed in org.apache.spark.sql. >>>>> Connect does not support RDDs, SparkContext, etc... There are >>>>> currently no plans to support this. We are considering adding a >>>>> compatibility layer but that will be limited in scope. From running >>>>> Connect >>>>> in production for the last year, we see that most users can migrate their >>>>> workloads without any problems. >>>>> >>>>> I do want to call out that this proposal is mostly aimed at how new >>>>> users will interact with Spark. Existing users, when they migrate their >>>>> application to Spark 4, have to set a conf when it turns out their >>>>> application is not working. This should be a minor inconvenience compared >>>>> to the headaches that a new Scala version or other library upgrades can >>>>> cause. >>>>> >>>>> Since this is a breaking change, I do think this should be done in a >>>>> major version. >>>>> >>>>> With the risk of repeating the SPIP, using Connect as the default >>>>> brings a lot to the table (e.g. simplicity, easier upgrades, >>>>> extensibility, >>>>> etc...), I'd urge you to also factor this into your decision making. >>>>> >>>>> Happy thanksgiving! >>>>> >>>>> Cheers, >>>>> Herman >>>>> >>>>> On Thu, Nov 28, 2024 at 8:43 PM Mridul Muralidharan <mri...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I agree with Holden, I am leaning -1 on the proposal as well. >>>>>> Unlike removal of deprecated features, which we align on a major >>>>>> version boundary, changing the default is something we can do in a minor >>>>>> version as well - once there is api parity. >>>>>> >>>>>> Irrespective of which major/minor version we make the switch in - >>>>>> there could be user impact; minimizing this impact would be greatly >>>>>> appreciated by our users. >>>>>> >>>>>> Regards, >>>>>> Mridul >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Nov 27, 2024 at 8:31 PM Holden Karau <holden.ka...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> -0.5: I don’t think this a good idea for JVM apps until we have API >>>>>>> parity. (Binding but to be clear not a veto) >>>>>>> >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>>>>>> <https://www.fighthealthinsurance.com/?q=hk_email> >>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> Pronouns: she/her >>>>>>> >>>>>>> >>>>>>> On Wed, Nov 27, 2024 at 6:27 PM Xinrong Meng <xinr...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> Thank you Herman! >>>>>>>> >>>>>>>> On Thu, Nov 28, 2024 at 3:37 AM Dongjoon Hyun < >>>>>>>> dongjoon.h...@gmail.com> wrote: >>>>>>>> >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> On Wed, Nov 27, 2024 at 09:16 Denny Lee <denny.g....@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 (non-binding) >>>>>>>>>> >>>>>>>>>> On Wed, Nov 27, 2024 at 3:07 AM Martin Grund >>>>>>>>>> <mar...@databricks.com.invalid> wrote: >>>>>>>>>> >>>>>>>>>>> As part of the discussion on this topic, I would love to >>>>>>>>>>> highlight the work that the community is currently doing to support >>>>>>>>>>> SparkML, which is traditionally very RDD-heavy, natively in Spark >>>>>>>>>>> Connect. >>>>>>>>>>> Bobby's awesome work shows that, over time, we can extend the >>>>>>>>>>> features of >>>>>>>>>>> Spark Connect and support workloads that we previously thought >>>>>>>>>>> could not be >>>>>>>>>>> supported easily. >>>>>>>>>>> >>>>>>>>>>> https://github.com/apache/spark/pull/48791 >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> On Wed, Nov 27, 2024 at 11:42 AM Yang,Jie(INF) >>>>>>>>>>> <yangji...@baidu.com.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> +1 >>>>>>>>>>>> -------- 原始邮件 -------- >>>>>>>>>>>> 发件人:Hyukjin Kwon<gurwls...@apache.org> >>>>>>>>>>>> 时间:2024-11-27 08:04:06 >>>>>>>>>>>> 主题:[外部邮件] Re: Spark Connect the default API in Spark 4.0 >>>>>>>>>>>> 收件人:Bjørn Jørgensen<bjornjorgen...@gmail.com>; >>>>>>>>>>>> 抄送人:Herman van Hovell<her...@databricks.com.invalid>;Spark dev >>>>>>>>>>>> list<dev@spark.apache.org>; >>>>>>>>>>>> +1 >>>>>>>>>>>> >>>>>>>>>>>> On Mon, 25 Nov 2024 at 23:33, Bjørn Jørgensen < >>>>>>>>>>>> bjornjorgen...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> +1 >>>>>>>>>>>>> >>>>>>>>>>>>> man. 25. nov. 2024 kl. 14:48 skrev Herman van Hovell >>>>>>>>>>>>> <her...@databricks.com.invalid>: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I would like to start a discussion on "Spark Connect the >>>>>>>>>>>>>> default API in Spark 4.0". >>>>>>>>>>>>>> >>>>>>>>>>>>>> The rationale for this change is that Spark Connect brings a >>>>>>>>>>>>>> lot of improvements with respect to simplicity, stability, >>>>>>>>>>>>>> isolation, >>>>>>>>>>>>>> upgradability, and extensibility (all detailed in the SPIP). In >>>>>>>>>>>>>> a nutshell: >>>>>>>>>>>>>> we want to introduce a flag, spark.api.mode, that allows a >>>>>>>>>>>>>> user to choose between classic or connect mode, the default >>>>>>>>>>>>>> being connect. A user can easily fallback to Classic by >>>>>>>>>>>>>> setting spark.api.mode to classic. >>>>>>>>>>>>>> >>>>>>>>>>>>>> SPIP: >>>>>>>>>>>>>> https://docs.google.com/document/d/1C0kuQEliG78HujVwdnSk0wjNwHEDdwo2o8aVq7kbhTo/edit?tab=t.0#heading=h.r2c3xrbiklu3 >>>>>>>>>>>>>> <https://mailshield.baidu.com/check?q=5uIK5BsJhkKEitTyTno8Yb7Zq%2boLHvRsgSoBr5oTNJEHXWS9Np0U8pCuv2DeJDfCQJiI52FAoCrxDEqnj1jOqX9A3jtJcetvkKkKE696xfrLfKuuRuyCC9YrwN5IW4OUtkhdHz7C%2bER2GN9EPqnlIlX2osm36Zbn> >>>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-50411 >>>>>>>>>>>>>> <https://mailshield.baidu.com/check?q=vc5arXeK3OKfjk5Oxe1F%2fMNjR%2fSx5pTdbaOArWe9m2MpZDOF702CYYagPMQmbDqV7xnWwxsUdOc%3d> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am looking forward to your feedback! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Herman >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Bjørn Jørgensen >>>>>>>>>>>>> Vestre Aspehaug 4, 6010 Ålesund >>>>>>>>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++%0D%0ANorge?entry=gmail&source=g> >>>>>>>>>>>>> Norge >>>>>>>>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4,+6010+%C3%85lesund++%0D%0ANorge?entry=gmail&source=g> >>>>>>>>>>>>> >>>>>>>>>>>>> +47 480 94 297 >>>>>>>>>>>>> >>>>>>>>>>>>