+1 for dropping Spark 1 support. I don't think we have enough users to justify supporting both, and its been a long time since this idea originally came-up (when Spark2 wasn't stable) and now Spark 2 is standard in all Hadoop distros. As for switching to the Dataframe API, as long as Spark 2 doesn't support scanning through the state periodically (even if no data for a key), watermarks won't fire keys that didn't see updates.
On Thu, Nov 9, 2017 at 9:12 AM Thomas Weise <t...@apache.org> wrote: > +1 (non-binding) for dropping 1.x support > > I don't have the impression that there is significant adoption for Beam on > Spark 1.x ? A stronger Spark runner that works well on 2.x will be better > for Beam adoption than a runner that has to compromise due to 1.x baggage. > Development efforts can go into improving the runner. > > Thanks, > Thomas > > > On Thu, Nov 9, 2017 at 4:08 AM, Srinivas Reddy <srinivas96all...@gmail.com > > > wrote: > > > +1 > > > > > > > > -- > > Srinivas Reddy > > > > http://mrsrinivas.com/ > > > > > > (Sent via gmail web) > > > > On 8 November 2017 at 14:27, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > > > Hi all, > > > > > > as you might know, we are working on Spark 2.x support in the Spark > > runner. > > > > > > I'm working on a PR about that: > > > > > > https://github.com/apache/beam/pull/3808 > > > > > > Today, we have something working with both Spark 1.x and 2.x from a > code > > > standpoint, but I have to deal with dependencies. It's the first step > of > > > the update as I'm still using RDD, the second step would be to support > > > dataframe (but for that, I would need PCollection elements with > schemas, > > > that's another topic on which Eugene, Reuven and I are discussing). > > > > > > However, as all major distributions now ship Spark 2.x, I don't think > > it's > > > required anymore to support Spark 1.x. > > > > > > If we agree, I will update and cleanup the PR to only support and focus > > on > > > Spark 2.x. > > > > > > So, that's why I'm calling for a vote: > > > > > > [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only > > > [ ] 0 (I don't care ;)) > > > [ ] -1, I would like to still support Spark 1.x, and so having > support > > > of both Spark 1.x and 2.x (please provide specific comment) > > > > > > This vote is open for 48 hours (I have the commits ready, just waiting > > the > > > end of the vote to push on the PR). > > > > > > Thanks ! > > > Regards > > > JB > > > -- > > > Jean-Baptiste Onofré > > > jbono...@apache.org > > > http://blog.nanthrax.net > > > Talend - http://www.talend.com > > > > > >