+1 from me, with a friendly deprecation process

I am convinced by the following:

 - We don't have the resources to make both great, and anyhow it isn't
worth it
 - People keeping up with Beam releases are likely to be keeping up with
Spark as well
 - Spark 1 users already have a Spark 1 runner for Beam and can keep using
it (and we don't actually lose the ability to update it in a pinch)
 - Key features like portability (hence Python) will be some time so we
should definitely not waste effort building that feature with Spark 1 in
mind

I think it makes sense to communicate with email to users@ and in the
release notes of 2.2.0. That communication should be specific and indicate
whether we are planning to merely not work on it anymore or actually remove
it in 2.3.0.

Kenn

On Thu, Nov 9, 2017 at 6:35 AM, Amit Sela <amitsel...@gmail.com> wrote:

> +1 for dropping Spark 1 support.
> I don't think we have enough users to justify supporting both, and its been
> a long time since this idea originally came-up (when Spark2 wasn't stable)
> and now Spark 2 is standard in all Hadoop distros.
> As for switching to the Dataframe API, as long as Spark 2 doesn't support
> scanning through the state periodically (even if no data for a key),
> watermarks won't fire keys that didn't see updates.
>
> On Thu, Nov 9, 2017 at 9:12 AM Thomas Weise <t...@apache.org> wrote:
>
> > +1 (non-binding) for dropping 1.x support
> >
> > I don't have the impression that there is significant adoption for Beam
> on
> > Spark 1.x ? A stronger Spark runner that works well on 2.x will be better
> > for Beam adoption than a runner that has to compromise due to 1.x
> baggage.
> > Development efforts can go into improving the runner.
> >
> > Thanks,
> > Thomas
> >
> >
> > On Thu, Nov 9, 2017 at 4:08 AM, Srinivas Reddy <
> srinivas96all...@gmail.com
> > >
> > wrote:
> >
> > > +1
> > >
> > >
> > >
> > > --
> > > Srinivas Reddy
> > >
> > > http://mrsrinivas.com/
> > >
> > >
> > > (Sent via gmail web)
> > >
> > > On 8 November 2017 at 14:27, Jean-Baptiste Onofré <j...@nanthrax.net>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > as you might know, we are working on Spark 2.x support in the Spark
> > > runner.
> > > >
> > > > I'm working on a PR about that:
> > > >
> > > > https://github.com/apache/beam/pull/3808
> > > >
> > > > Today, we have something working with both Spark 1.x and 2.x from a
> > code
> > > > standpoint, but I have to deal with dependencies. It's the first step
> > of
> > > > the update as I'm still using RDD, the second step would be to
> support
> > > > dataframe (but for that, I would need PCollection elements with
> > schemas,
> > > > that's another topic on which Eugene, Reuven and I are discussing).
> > > >
> > > > However, as all major distributions now ship Spark 2.x, I don't think
> > > it's
> > > > required anymore to support Spark 1.x.
> > > >
> > > > If we agree, I will update and cleanup the PR to only support and
> focus
> > > on
> > > > Spark 2.x.
> > > >
> > > > So, that's why I'm calling for a vote:
> > > >
> > > >   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
> > > >   [ ] 0 (I don't care ;))
> > > >   [ ] -1, I would like to still support Spark 1.x, and so having
> > support
> > > > of both Spark 1.x and 2.x (please provide specific comment)
> > > >
> > > > This vote is open for 48 hours (I have the commits ready, just
> waiting
> > > the
> > > > end of the vote to push on the PR).
> > > >
> > > > Thanks !
> > > > Regards
> > > > JB
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>

Reply via email to