Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

Holden Karau Wed, 08 Nov 2017 21:20:07 -0800

I'm +1 on dropping Spark 1. There are a lot of exciting improvements in
Spark 2, and trying to write efficient code that runs between Spark 1 and
Spark 2 is super painful in the long term. It would be one thing if there
were a lot of people available to work on the Spark runners, but it seems
like we'd be better spent focusing our energy on the future.


I don't know a lot of folks who are stuck on Spark 1, and the few that I
know are planning to migrate in the next few months anyways.

Note: this is a non-binding vote as I'm not a committer or PMC member.

On Wed, Nov 8, 2017 at 3:43 AM, Ted Yu <[email protected]> wrote:

> Having both Spark1 and Spark2 modules would benefit wider user base.
>
> I would vote for that.
>
> Cheers
>
> On Wed, Nov 8, 2017 at 12:51 AM, Jean-Baptiste Onofré <[email protected]>
> wrote:
>
> > Hi Robert,
> >
> > Thanks for your feedback !
> >
> > From an user perspective, with the current state of the PR, the same
> > pipelines can run on both Spark 1.x and 2.x: the only difference is the
> > dependencies set.
> >
> > I'm calling the vote to get suck kind of feedback: if we consider Spark
> > 1.x still need to be supported, no problem, I will improve the PR to have
> > three modules (common, spark1, spark2) and let users pick the desired
> > version.
> >
> > Let's wait a bit other feedbacks, I will update the PR accordingly.
> >
> > Regards
> > JB
> >
> >
> > On 11/08/2017 09:47 AM, Robert Bradshaw wrote:
> >
> >> I'm generally a -0.5 on this change, or at least doing so hastily.
> >>
> >> As with dropping Java 7 support, I think this should at least be
> >> announced in release notes that we're considering dropping support in
> >> the subsequent release, as this dev list likely does not reach a
> >> substantial portion of the userbase.
> >>
> >> How much work is it to move from a Spark 1.x cluster to a Spark 2.x
> >> cluster? I get the feeling it's not nearly as transparent as upgrading
> >> Java versions. Can Spark 1.x pipelines be run on Spark 2.x clusters,
> >> or is a new cluster (and/or upgrading all pipelines) required (e.g.
> >> for those who operate spark clusters shared among their many users)?
> >>
> >> Looks like the latest release of Spark 1.x was about a year ago,
> >> overlapping a bit with the 2.x series which is coming up on 1.5 years
> >> old, so I could see a lot of people still using 1.x even if 2.x is
> >> clearly the future. But it sure doesn't seem very backwards
> >> compatible.
> >>
> >> Mostly I'm not comfortable with dropping 1.x in the same release as
> >> adding support for 2.x, giving no transition period, but could be
> >> convinced if this transition is mostly a no-op or no one's still using
> >> 1.x. If there's non-trivial code complexity issues, I would perhaps
> >> revisit the issue of having a single Spark Runner that does chooses
> >> the backend implicitly in favor of simply having two runners which
> >> share the code that's easy to share and diverge otherwise (which seems
> >> it would be much simpler both to implement and explain to users). I
> >> would be OK with even letting the Spark 1.x runner be somewhat
> >> stagnant (e.g. few or no new features) until we decide we can kill it
> >> off.
> >>
> >> On Tue, Nov 7, 2017 at 11:27 PM, Jean-Baptiste Onofré <[email protected]>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> as you might know, we are working on Spark 2.x support in the Spark
> >>> runner.
> >>>
> >>> I'm working on a PR about that:
> >>>
> >>> https://github.com/apache/beam/pull/3808
> >>>
> >>> Today, we have something working with both Spark 1.x and 2.x from a
> code
> >>> standpoint, but I have to deal with dependencies. It's the first step
> of
> >>> the
> >>> update as I'm still using RDD, the second step would be to support
> >>> dataframe
> >>> (but for that, I would need PCollection elements with schemas, that's
> >>> another topic on which Eugene, Reuven and I are discussing).
> >>>
> >>> However, as all major distributions now ship Spark 2.x, I don't think
> >>> it's
> >>> required anymore to support Spark 1.x.
> >>>
> >>> If we agree, I will update and cleanup the PR to only support and focus
> >>> on
> >>> Spark 2.x.
> >>>
> >>> So, that's why I'm calling for a vote:
> >>>
> >>>    [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
> >>>    [ ] 0 (I don't care ;))
> >>>    [ ] -1, I would like to still support Spark 1.x, and so having
> >>> support of
> >>> both Spark 1.x and 2.x (please provide specific comment)
> >>>
> >>> This vote is open for 48 hours (I have the commits ready, just waiting
> >>> the
> >>> end of the vote to push on the PR).
> >>>
> >>> Thanks !
> >>> Regards
> >>> JB
> >>> --
> >>> Jean-Baptiste Onofré
> >>> [email protected]
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > [email protected]
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>



-- 
Twitter: https://twitter.com/holdenkarau

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

Reply via email to