+1 on moving forward with Spark 2.x only. Spark 1 users can still use already released Spark runners, and we can support them with minor version releases for future bug fixes.
I don't see how important it is to make future Beam releases available to Spark 1 users. If they choose not to upgrade Spark clusters, maybe they don't need the newest Beam releases as well. I think it is more important to 1). be able to leverage new features in Spark 2.x, 2.) extend user base to Spark 2. -- Pei On Thu, Nov 9, 2017 at 1:45 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > That's a good point about Oozie does only supporting only Spark 1 or 2 at a > time on a cluster -- but do we know people using Oozie and Spark 1 that > would still be using Spark 1 by the time of the next BEAM release? The last > Spark 1 release was a year ago (and last non-maintenance release almost 20 > months ago). > > On Wed, Nov 8, 2017 at 9:30 PM, NerdyNick <nerdyn...@gmail.com> wrote: > > > I don't know if ditching Spark 1 out right right now would be a great > move > > given that a lot of the main support applications around spark haven't > yet > > fully moved to Spark 2 yet. Yet alone have support for having a cluster > > with both. Oozie for example is still pre stable release for their Spark > 1 > > and can't support a cluster with mixed Spark version. I think maybe doing > > as suggested above with the common, spark1, spark2 packaging might be > best > > during this carry over phase. Maybe even just flag spark 1 as deprecated > > and just being maintained might be enough. > > > > On Wed, Nov 8, 2017 at 10:25 PM, Holden Karau <hol...@pigscanfly.ca> > > wrote: > > > > > Also, upgrading Spark 1 to 2 is generally easier than changing JVM > > > versions. For folks using YARN or the hosted environments it pretty > much > > > trivial since you can effectively have distinct Spark clusters for each > > > job. > > > > > > On Wed, Nov 8, 2017 at 9:19 PM, Holden Karau <hol...@pigscanfly.ca> > > wrote: > > > > > > > I'm +1 on dropping Spark 1. There are a lot of exciting improvements > in > > > > Spark 2, and trying to write efficient code that runs between Spark 1 > > and > > > > Spark 2 is super painful in the long term. It would be one thing if > > there > > > > were a lot of people available to work on the Spark runners, but it > > seems > > > > like we'd be better spent focusing our energy on the future. > > > > > > > > I don't know a lot of folks who are stuck on Spark 1, and the few > that > > I > > > > know are planning to migrate in the next few months anyways. > > > > > > > > Note: this is a non-binding vote as I'm not a committer or PMC > member. > > > > > > > > On Wed, Nov 8, 2017 at 3:43 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > > > >> Having both Spark1 and Spark2 modules would benefit wider user base. > > > >> > > > >> I would vote for that. > > > >> > > > >> Cheers > > > >> > > > >> On Wed, Nov 8, 2017 at 12:51 AM, Jean-Baptiste Onofré < > > j...@nanthrax.net> > > > >> wrote: > > > >> > > > >> > Hi Robert, > > > >> > > > > >> > Thanks for your feedback ! > > > >> > > > > >> > From an user perspective, with the current state of the PR, the > same > > > >> > pipelines can run on both Spark 1.x and 2.x: the only difference > is > > > the > > > >> > dependencies set. > > > >> > > > > >> > I'm calling the vote to get suck kind of feedback: if we consider > > > Spark > > > >> > 1.x still need to be supported, no problem, I will improve the PR > to > > > >> have > > > >> > three modules (common, spark1, spark2) and let users pick the > > desired > > > >> > version. > > > >> > > > > >> > Let's wait a bit other feedbacks, I will update the PR > accordingly. > > > >> > > > > >> > Regards > > > >> > JB > > > >> > > > > >> > > > > >> > On 11/08/2017 09:47 AM, Robert Bradshaw wrote: > > > >> > > > > >> >> I'm generally a -0.5 on this change, or at least doing so > hastily. > > > >> >> > > > >> >> As with dropping Java 7 support, I think this should at least be > > > >> >> announced in release notes that we're considering dropping > support > > in > > > >> >> the subsequent release, as this dev list likely does not reach a > > > >> >> substantial portion of the userbase. > > > >> >> > > > >> >> How much work is it to move from a Spark 1.x cluster to a Spark > 2.x > > > >> >> cluster? I get the feeling it's not nearly as transparent as > > > upgrading > > > >> >> Java versions. Can Spark 1.x pipelines be run on Spark 2.x > > clusters, > > > >> >> or is a new cluster (and/or upgrading all pipelines) required > (e.g. > > > >> >> for those who operate spark clusters shared among their many > > users)? > > > >> >> > > > >> >> Looks like the latest release of Spark 1.x was about a year ago, > > > >> >> overlapping a bit with the 2.x series which is coming up on 1.5 > > years > > > >> >> old, so I could see a lot of people still using 1.x even if 2.x > is > > > >> >> clearly the future. But it sure doesn't seem very backwards > > > >> >> compatible. > > > >> >> > > > >> >> Mostly I'm not comfortable with dropping 1.x in the same release > as > > > >> >> adding support for 2.x, giving no transition period, but could be > > > >> >> convinced if this transition is mostly a no-op or no one's still > > > using > > > >> >> 1.x. If there's non-trivial code complexity issues, I would > perhaps > > > >> >> revisit the issue of having a single Spark Runner that does > chooses > > > >> >> the backend implicitly in favor of simply having two runners > which > > > >> >> share the code that's easy to share and diverge otherwise (which > > > seems > > > >> >> it would be much simpler both to implement and explain to > users). I > > > >> >> would be OK with even letting the Spark 1.x runner be somewhat > > > >> >> stagnant (e.g. few or no new features) until we decide we can > kill > > it > > > >> >> off. > > > >> >> > > > >> >> On Tue, Nov 7, 2017 at 11:27 PM, Jean-Baptiste Onofré < > > > j...@nanthrax.net > > > >> > > > > >> >> wrote: > > > >> >> > > > >> >>> Hi all, > > > >> >>> > > > >> >>> as you might know, we are working on Spark 2.x support in the > > Spark > > > >> >>> runner. > > > >> >>> > > > >> >>> I'm working on a PR about that: > > > >> >>> > > > >> >>> https://github.com/apache/beam/pull/3808 > > > >> >>> > > > >> >>> Today, we have something working with both Spark 1.x and 2.x > from > > a > > > >> code > > > >> >>> standpoint, but I have to deal with dependencies. It's the first > > > step > > > >> of > > > >> >>> the > > > >> >>> update as I'm still using RDD, the second step would be to > support > > > >> >>> dataframe > > > >> >>> (but for that, I would need PCollection elements with schemas, > > > that's > > > >> >>> another topic on which Eugene, Reuven and I are discussing). > > > >> >>> > > > >> >>> However, as all major distributions now ship Spark 2.x, I don't > > > think > > > >> >>> it's > > > >> >>> required anymore to support Spark 1.x. > > > >> >>> > > > >> >>> If we agree, I will update and cleanup the PR to only support > and > > > >> focus > > > >> >>> on > > > >> >>> Spark 2.x. > > > >> >>> > > > >> >>> So, that's why I'm calling for a vote: > > > >> >>> > > > >> >>> [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x > only > > > >> >>> [ ] 0 (I don't care ;)) > > > >> >>> [ ] -1, I would like to still support Spark 1.x, and so > having > > > >> >>> support of > > > >> >>> both Spark 1.x and 2.x (please provide specific comment) > > > >> >>> > > > >> >>> This vote is open for 48 hours (I have the commits ready, just > > > waiting > > > >> >>> the > > > >> >>> end of the vote to push on the PR). > > > >> >>> > > > >> >>> Thanks ! > > > >> >>> Regards > > > >> >>> JB > > > >> >>> -- > > > >> >>> Jean-Baptiste Onofré > > > >> >>> jbono...@apache.org > > > >> >>> http://blog.nanthrax.net > > > >> >>> Talend - http://www.talend.com > > > >> >>> > > > >> >> > > > >> > -- > > > >> > Jean-Baptiste Onofré > > > >> > jbono...@apache.org > > > >> > http://blog.nanthrax.net > > > >> > Talend - http://www.talend.com > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > Twitter: https://twitter.com/holdenkarau > > > > > > > > > > > > > > > > -- > > > Twitter: https://twitter.com/holdenkarau > > > > > > > > > > > -- > > Nick Verbeck - NerdyNick > > ---------------------------------------------------- > > NerdyNick.com > > TrailsOffroad.com > > NoKnownBoundaries.com > > > > > > -- > Twitter: https://twitter.com/holdenkarau >