+1 for merging to master. It's going to help a lot for us to try it out, and also contribute back for the missing features.
Thanks, Xinyu On Thu, Oct 10, 2019 at 6:40 AM Alexey Romanenko <[email protected]> wrote: > +1 for merging this new runner too (even if it’s not 100% ready for the > moment) in case if it doesn’t break/fail/affect all other tests and Jenkins > jobs. I mean, it should be transparent for other Beam components. > > Also, since it won’t be officially “released” right after merging, we need > to clearly warn users that it’s not ready to use in production. > > > On 10 Oct 2019, at 15:25, Ryan Skraba <[email protected]> wrote: > > > > Merging to master sounds like a really good idea, even if it is not > > feature-complete yet. > > > > It's already a pretty big accomplishment getting it to the current > > state (great job all!). Merging it into master would give it a pretty > > good boost for visibility and encouraging some discussion about where > > it's going. > > > > I don't think there's any question about removing the RDD-based > > (a.k.a. old/legacy/stable) spark runner yet! > > > > All my best, Ryan > > > > > > On Thu, Oct 10, 2019 at 2:47 PM Jean-Baptiste Onofré <[email protected]> > wrote: > >> > >> +1 > >> > >> As the runner seems almost "equivalent" to the one we have, it makes > sense. > >> > >> Question is: do we keep the "old" spark runner for a while or not (or > >> just keep on previous version/tag on git) ? > >> > >> Regards > >> JB > >> > >> On 10/10/2019 09:39, Etienne Chauchot wrote: > >>> Hi guys, > >>> > >>> You probably know that there has been for several months an work > >>> developing a new Spark runner based on Spark Structured Streaming > >>> framework. This work is located in a feature branch here: > >>> https://github.com/apache/beam/tree/spark-runner_structured-streaming > >>> > >>> To attract more contributors and get some user feedback, we think it is > >>> time to merge it to master. Before doing so, some steps need to be > >>> achieved: > >>> > >>> - finish the work on spark Encoders (that allow to call Beam coders) > >>> because, right now, the runner is in an unstable state (some transforms > >>> use the new way of doing ser/de and some use the old one, making a > >>> pipeline incoherent toward serialization) > >>> > >>> - clean history: The history contains commits from November 2018, so > >>> there is a good amount of work, thus a consequent number of commits. > >>> They were already squashed but not from September 2019 > >>> > >>> Regarding status: > >>> > >>> - the runner passes 89% of the validates runner tests in batch mode. We > >>> hope to pass more with the new Encoders > >>> > >>> - Streaming mode is barely started (waiting for the multi-aggregations > >>> support in spark SS framework from the Spark community) > >>> > >>> - Runner can execute Nexmark > >>> > >>> - Some things are not wired up yet > >>> > >>> - Beam Schemas not wired with Spark Schemas > >>> > >>> - Optional features of the model not implemented: state api, timer > >>> api, splittable doFn api, … > >>> > >>> WDYT, can we merge it to master once the 2 steps are done ? > >>> > >>> Best > >>> > >>> Etienne > >>> > >> > >> -- > >> Jean-Baptiste Onofré > >> [email protected] > >> http://blog.nanthrax.net > >> Talend - http://www.talend.com > >
