Hi all,

thanks a lot for all your feedback.

The trend is about to upgrade to Spark 2.x and drop Spark 1.x support.

However, some of you (especially Reuven and Robert) commented that users have to be pinged as well. It makes perfect sense, and it was my intention.

I propose the following action plan:
- from the technical front, currently, I have two private branches ready: one with Spark 1.x & Spark 2.x support (with a common module and three artifacts), another one with an upgrade to Spark 2.x (dropping 1.x). I will merge the later on the PR. - I will forward the vote e-mail to the user mailing list, hopefully we will have user feedback.

Thanks again,
Regards
JB

On 11/08/2017 08:27 AM, Jean-Baptiste Onofré wrote:
Hi all,

as you might know, we are working on Spark 2.x support in the Spark runner.

I'm working on a PR about that:

https://github.com/apache/beam/pull/3808

Today, we have something working with both Spark 1.x and 2.x from a code standpoint, but I have to deal with dependencies. It's the first step of the update as I'm still using RDD, the second step would be to support dataframe (but for that, I would need PCollection elements with schemas, that's another topic on which Eugene, Reuven and I are discussing).

However, as all major distributions now ship Spark 2.x, I don't think it's required anymore to support Spark 1.x.

If we agree, I will update and cleanup the PR to only support and focus on Spark 2.x.

So, that's why I'm calling for a vote:

   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
  [ ] -1, I would like to still support Spark 1.x, and so having support of both Spark 1.x and 2.x (please provide specific comment)

This vote is open for 48 hours (I have the commits ready, just waiting the end of the vote to push on the PR).

Thanks !
Regards
JB

--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to