Hi guys,

To illustrate the current discussion about Spark versions support, you can take a look on:

--
Spark 1 & Spark 2 Support Branch

https://github.com/jbonofre/beam/tree/BEAM-1920-SPARK2-MODULES

This branch contains a Spark runner common module compatible with both Spark 1.x and 2.x. For convenience, we introduced spark1 & spark2 modules/artifacts containing just a pom.xml to define the dependencies set.

--
Spark 2 Only Branch

https://github.com/jbonofre/beam/tree/BEAM-1920-SPARK2-ONLY

This branch is an upgrade to Spark 2.x and "drop" support of Spark 1.x.

As I'm ready to merge one of the other in the PR, I would like to complete the vote/discussion pretty soon.

Correct me if I'm wrong, but it seems that the preference is to drop Spark 1.x to focus only on Spark 2.x (for the Spark 2 Only Branch).

I would like to call a final vote to act the merge I will do:

    [ ] Use Spark 1 & Spark 2 Support Branch
    [ ] Use Spark 2 Only Branch

This informal vote is open for 48 hours.

Please, let me know what your preference is.

Thanks !
Regards
JB

On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote:
Hi Beamers,

I'm forwarding this discussion & vote from the dev mailing list to the user mailing list.
The goal is to have your feedback as user.

Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2 in your dependencies set depending the Spark target version you want. 2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If you still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.

Thoughts ?

Thanks !
Regards
JB


-------- Forwarded Message --------
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré <j...@nanthrax.net>
Reply-To: dev@beam.apache.org
To: dev@beam.apache.org

Hi all,

as you might know, we are working on Spark 2.x support in the Spark runner.

I'm working on a PR about that:

https://github.com/apache/beam/pull/3808

Today, we have something working with both Spark 1.x and 2.x from a code standpoint, but I have to deal with dependencies. It's the first step of the update as I'm still using RDD, the second step would be to support dataframe (but for that, I would need PCollection elements with schemas, that's another topic on which Eugene, Reuven and I are discussing).

However, as all major distributions now ship Spark 2.x, I don't think it's required anymore to support Spark 1.x.

If we agree, I will update and cleanup the PR to only support and focus on Spark 2.x.

So, that's why I'm calling for a vote:

   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
  [ ] -1, I would like to still support Spark 1.x, and so having support of both Spark 1.x and 2.x (please provide specific comment)

This vote is open for 48 hours (I have the commits ready, just waiting the end of the vote to push on the PR).

Thanks !
Regards
JB

--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to