Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

Robert Bradshaw Wed, 08 Nov 2017 00:48:07 -0800

I'm generally a -0.5 on this change, or at least doing so hastily.

As with dropping Java 7 support, I think this should at least be
announced in release notes that we're considering dropping support in
the subsequent release, as this dev list likely does not reach a
substantial portion of the userbase.

How much work is it to move from a Spark 1.x cluster to a Spark 2.x
cluster? I get the feeling it's not nearly as transparent as upgrading
Java versions. Can Spark 1.x pipelines be run on Spark 2.x clusters,
or is a new cluster (and/or upgrading all pipelines) required (e.g.
for those who operate spark clusters shared among their many users)?

Looks like the latest release of Spark 1.x was about a year ago,
overlapping a bit with the 2.x series which is coming up on 1.5 years
old, so I could see a lot of people still using 1.x even if 2.x is
clearly the future. But it sure doesn't seem very backwards
compatible.

Mostly I'm not comfortable with dropping 1.x in the same release as
adding support for 2.x, giving no transition period, but could be
convinced if this transition is mostly a no-op or no one's still using
1.x. If there's non-trivial code complexity issues, I would perhaps
revisit the issue of having a single Spark Runner that does chooses
the backend implicitly in favor of simply having two runners which
share the code that's easy to share and diverge otherwise (which seems
it would be much simpler both to implement and explain to users). I
would be OK with even letting the Spark 1.x runner be somewhat
stagnant (e.g. few or no new features) until we decide we can kill it
off.

On Tue, Nov 7, 2017 at 11:27 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Hi all,
>
> as you might know, we are working on Spark 2.x support in the Spark runner.
>
> I'm working on a PR about that:
>
> https://github.com/apache/beam/pull/3808
>
> Today, we have something working with both Spark 1.x and 2.x from a code
> standpoint, but I have to deal with dependencies. It's the first step of the
> update as I'm still using RDD, the second step would be to support dataframe
> (but for that, I would need PCollection elements with schemas, that's
> another topic on which Eugene, Reuven and I are discussing).
>
> However, as all major distributions now ship Spark 2.x, I don't think it's
> required anymore to support Spark 1.x.
>
> If we agree, I will update and cleanup the PR to only support and focus on
> Spark 2.x.
>
> So, that's why I'm calling for a vote:
>
>   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>   [ ] 0 (I don't care ;))
>   [ ] -1, I would like to still support Spark 1.x, and so having support of
> both Spark 1.x and 2.x (please provide specific comment)
>
> This vote is open for 48 hours (I have the commits ready, just waiting the
> end of the vote to push on the PR).
>
> Thanks !
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

Reply via email to