Thanks all!  Yes - I will start a vote thread now

Shivaram

On Wed, Aug 21, 2024 at 10:24 AM Herman van Hovell <her...@databricks.com>
wrote:

> +1
>
> Let's start a vote?
>
> On Fri, Aug 16, 2024 at 2:05 AM yangjie01 <yangji...@baidu.com.invalid>
> wrote:
>
>> +1
>> -------- 原始邮件 --------
>> 发件人:Jungtaek Lim<kabhwan.opensou...@gmail.com>
>> 时间:2024-08-16 09:06:52
>> 主题:[外部邮件] Re: [DISCUSS] Deprecating SparkR
>> 收件人:Wenchen Fan<cloud0...@gmail.com>;
>> 抄送人:L. C. Hsieh<vii...@gmail.com>;Dongjoon 
>> Hyun<dongjoon.h...@gmail.com>;Holden
>> Karau<holden.ka...@gmail.com>;Xiao Li<gatorsm...@gmail.com>;Hyukjin Kwon<
>> gurwls...@apache.org>;Nicholas Chammas<nicholas.cham...@gmail.com>;Shivaram
>> Venkataraman<shivaram.venkatara...@gmail.com>;dev<dev@spark.apache.org>;
>> +1
>>
>> Looks to be sufficient to VOTE?
>>
>> 2024년 8월 14일 (수) 오전 1:10, Wenchen Fan <cloud0...@gmail.com>님이 작성:
>>
>>> +1
>>>
>>> On Tue, Aug 13, 2024 at 10:50 PM L. C. Hsieh <vii...@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> On Tue, Aug 13, 2024 at 2:54 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
>>>> wrote:
>>>> >
>>>> > +1
>>>> >
>>>> > Dongjoon
>>>> >
>>>> > On Mon, Aug 12, 2024 at 17:52 Holden Karau <holden.ka...@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> +1
>>>> >>
>>>> >> Are the sparklyr folks on this list?
>>>> >>
>>>> >> Twitter: https://twitter.com/holdenkarau
>>>> <https://mailshield.baidu.com/check?q=9DewFnOIsK%2bK64Uu60Jx4QkcL9rDgnApD6spzOBjk%2fa2KQxn>
>>>> >> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> <https://mailshield.baidu.com/check?q=D34Ozfkj%2bFrnkuu9ci%2b4FcMkreOvMZ3jO85bIw%3d%3d>
>>>> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>> <https://mailshield.baidu.com/check?q=nadOZCZjNeU0qOVGCJesf8dvH4OrsWdKamKIxnJncPneWoN8%2bsIqc2DWow8%3d>
>>>> >> Pronouns: she/her
>>>> >>
>>>> >>
>>>> >> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li <gatorsm...@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>> +1
>>>> >>>
>>>> >>> Hyukjin Kwon <gurwls...@apache.org> 于2024年8月12日周一 16:18写道:
>>>> >>>>
>>>> >>>> +1
>>>> >>>>
>>>> >>>> On Tue, Aug 13, 2024 at 7:04 AM Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> And just for the record, the stats that I screenshotted in that
>>>> thread I linked to showed the following page views for each sub-section
>>>> under `docs/latest/api/`:
>>>> >>>>>
>>>> >>>>> - python: 758K
>>>> >>>>> - java: 66K
>>>> >>>>> - sql: 39K
>>>> >>>>> - scala: 35K
>>>> >>>>> - r: <1K
>>>> >>>>>
>>>> >>>>> I don’t recall over what time period those stats were collected
>>>> for, and there are certainly some factors of how the stats are gathered and
>>>> how the various language API docs are accessed that impact those numbers.
>>>> So it’s by no means a solid, objective measure. But I thought it was an
>>>> interesting signal nonetheless.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Aug 12, 2024, at 5:50 PM, Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> Not an R user myself, but +1.
>>>> >>>>>
>>>> >>>>> I first wondered about the future of SparkR after noticing how
>>>> low the visit stats were for the R API docs as compared to Python and
>>>> Scala. (I can’t seem to find those visit stats for the API docs anymore.)
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Aug 12, 2024, at 11:47 AM, Shivaram Venkataraman <
>>>> shivaram.venkatara...@gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> Hi
>>>> >>>>>
>>>> >>>>> About ten years ago, I created the original SparkR package as
>>>> part of my research at UC Berkeley [SPARK-5654]. After my PhD I started as
>>>> a professor at UW-Madison and my contributions to SparkR have been in the
>>>> background given my availability. I continue to be involved in the
>>>> community and teach a popular course at UW-Madison which uses Apache Spark
>>>> for programming assignments.
>>>> >>>>>
>>>> >>>>> As the original contributor and author of a research paper on
>>>> SparkR, I also continue to get private emails from users. A common question
>>>> I get is whether one should use SparkR in Apache Spark or the sparklyr
>>>> package (built on top of Apache Spark). You can also see this in
>>>> StackOverflow questions and other blog posts online:
>>>> https://www.google.com/search?q=sparkr+vs+sparklyr
>>>> <https://mailshield.baidu.com/check?q=V9NOYe0s3hzmZj3VP6RxusBARLVBxNQZIGxXtw0rzdnNXy8UKg73EZKRZaAgjW2ptlIyx1Uu080%3d>
>>>> . While, I have encouraged users to choose the SparkR package as it is
>>>> maintained by the Apache project, the more I looked into sparklyr, the more
>>>> I was convinced that it is a better choice for R users that want to
>>>> leverage the power of Spark:
>>>> >>>>>
>>>> >>>>> (1) sparklyr is developed by a community of developers who
>>>> understand the R programming language deeply, and as a result is more
>>>> idiomatic. In hindsight, sparklyr’s more idiomatic approach would have been
>>>> a better choice than the Scala-like API we have in SparkR.
>>>> >>>>>
>>>> >>>>> (2) Contributions to SparkR have decreased slowly. Over the last
>>>> two years, there have been 65 commits on the Spark R codebase (compared to
>>>> ~2200 on the Spark Python code base). In contrast Sparklyr has over 300
>>>> commits in the same period..
>>>> >>>>>
>>>> >>>>> (3) Previously, using and deploying sparklyr had been cumbersome
>>>> as it needed careful alignment of versions between Apache Spark and
>>>> sparklyr. However, the sparklyr community has implemented a new Spark
>>>> Connect based architecture which eliminates this issue.
>>>> >>>>>
>>>> >>>>> (4) The sparklyr community has maintained their package on CRAN –
>>>> it takes some effort to do this as the CRAN release process requires
>>>> passing a number of tests. While SparkR was on CRAN initially, we could not
>>>> maintain that given our release process and cadence. This makes sparklyr
>>>> much more accessible to the R community.
>>>> >>>>>
>>>> >>>>> So it is with a bittersweet feeling that I’m writing this email
>>>> to propose that we deprecate SparkR, and recommend sparklyr as the R
>>>> language binding for Spark. This will reduce complexity of our own
>>>> codebase, and more importantly reduce confusion for users. As the sparklyr
>>>> package is distributed using the same permissive license as Apache Spark,
>>>> there should be no downside for existing SparkR users in adopting it.
>>>> >>>>>
>>>> >>>>> My proposal is to mark SparkR as deprecated in the upcoming Spark
>>>> 4 release, and remove it from Apache Spark with the following major
>>>> release, Spark 5.
>>>> >>>>>
>>>> >>>>> I’m looking forward to hearing your thoughts and feedback on this
>>>> proposal and I’m happy to create the SPIP ticket for a vote on this
>>>> proposal using this email thread as the justification.
>>>> >>>>>
>>>> >>>>> Thanks
>>>> >>>>> Shivaram
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>

Reply via email to