Hi all,
thanks a lot for all your feedback.
The trend is about to upgrade to Spark 2.x and drop Spark 1.x support.
However, some of you (especially Reuven and Robert) commented that users have to
be pinged as well. It makes perfect sense, and it was my intention.
I propose the following action plan:
- from the technical front, currently, I have two private branches ready: one
with Spark 1.x & Spark 2.x support (with a common module and three artifacts),
another one with an upgrade to Spark 2.x (dropping 1.x). I will merge the later
on the PR.
- I will forward the vote e-mail to the user mailing list, hopefully we will
have user feedback.
Thanks again,
Regards
JB
On 11/08/2017 08:27 AM, Jean-Baptiste Onofré wrote:
Hi all,
as you might know, we are working on Spark 2.x support in the Spark runner.
I'm working on a PR about that:
https://github.com/apache/beam/pull/3808
Today, we have something working with both Spark 1.x and 2.x from a code
standpoint, but I have to deal with dependencies. It's the first step of the
update as I'm still using RDD, the second step would be to support dataframe
(but for that, I would need PCollection elements with schemas, that's another
topic on which Eugene, Reuven and I are discussing).
However, as all major distributions now ship Spark 2.x, I don't think it's
required anymore to support Spark 1.x.
If we agree, I will update and cleanup the PR to only support and focus on Spark
2.x.
So, that's why I'm calling for a vote:
[ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
[ ] 0 (I don't care ;))
[ ] -1, I would like to still support Spark 1.x, and so having support of
both Spark 1.x and 2.x (please provide specific comment)
This vote is open for 48 hours (I have the commits ready, just waiting the end
of the vote to push on the PR).
Thanks !
Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com