Having both Spark1 and Spark2 modules would benefit wider user base.

I would vote for that.

Cheers

On Wed, Nov 8, 2017 at 12:51 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Robert,
>
> Thanks for your feedback !
>
> From an user perspective, with the current state of the PR, the same
> pipelines can run on both Spark 1.x and 2.x: the only difference is the
> dependencies set.
>
> I'm calling the vote to get suck kind of feedback: if we consider Spark
> 1.x still need to be supported, no problem, I will improve the PR to have
> three modules (common, spark1, spark2) and let users pick the desired
> version.
>
> Let's wait a bit other feedbacks, I will update the PR accordingly.
>
> Regards
> JB
>
>
> On 11/08/2017 09:47 AM, Robert Bradshaw wrote:
>
>> I'm generally a -0.5 on this change, or at least doing so hastily.
>>
>> As with dropping Java 7 support, I think this should at least be
>> announced in release notes that we're considering dropping support in
>> the subsequent release, as this dev list likely does not reach a
>> substantial portion of the userbase.
>>
>> How much work is it to move from a Spark 1.x cluster to a Spark 2.x
>> cluster? I get the feeling it's not nearly as transparent as upgrading
>> Java versions. Can Spark 1.x pipelines be run on Spark 2.x clusters,
>> or is a new cluster (and/or upgrading all pipelines) required (e.g.
>> for those who operate spark clusters shared among their many users)?
>>
>> Looks like the latest release of Spark 1.x was about a year ago,
>> overlapping a bit with the 2.x series which is coming up on 1.5 years
>> old, so I could see a lot of people still using 1.x even if 2.x is
>> clearly the future. But it sure doesn't seem very backwards
>> compatible.
>>
>> Mostly I'm not comfortable with dropping 1.x in the same release as
>> adding support for 2.x, giving no transition period, but could be
>> convinced if this transition is mostly a no-op or no one's still using
>> 1.x. If there's non-trivial code complexity issues, I would perhaps
>> revisit the issue of having a single Spark Runner that does chooses
>> the backend implicitly in favor of simply having two runners which
>> share the code that's easy to share and diverge otherwise (which seems
>> it would be much simpler both to implement and explain to users). I
>> would be OK with even letting the Spark 1.x runner be somewhat
>> stagnant (e.g. few or no new features) until we decide we can kill it
>> off.
>>
>> On Tue, Nov 7, 2017 at 11:27 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi all,
>>>
>>> as you might know, we are working on Spark 2.x support in the Spark
>>> runner.
>>>
>>> I'm working on a PR about that:
>>>
>>> https://github.com/apache/beam/pull/3808
>>>
>>> Today, we have something working with both Spark 1.x and 2.x from a code
>>> standpoint, but I have to deal with dependencies. It's the first step of
>>> the
>>> update as I'm still using RDD, the second step would be to support
>>> dataframe
>>> (but for that, I would need PCollection elements with schemas, that's
>>> another topic on which Eugene, Reuven and I are discussing).
>>>
>>> However, as all major distributions now ship Spark 2.x, I don't think
>>> it's
>>> required anymore to support Spark 1.x.
>>>
>>> If we agree, I will update and cleanup the PR to only support and focus
>>> on
>>> Spark 2.x.
>>>
>>> So, that's why I'm calling for a vote:
>>>
>>>    [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>>>    [ ] 0 (I don't care ;))
>>>    [ ] -1, I would like to still support Spark 1.x, and so having
>>> support of
>>> both Spark 1.x and 2.x (please provide specific comment)
>>>
>>> This vote is open for 48 hours (I have the commits ready, just waiting
>>> the
>>> end of the vote to push on the PR).
>>>
>>> Thanks !
>>> Regards
>>> JB
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to