Re: [spark structured streaming runner] merge to master?

Xinyu Liu Thu, 10 Oct 2019 10:35:47 -0700

+1 for merging to master. It's going to help a lot for us to try it out,
and also contribute back for the missing features.


Thanks,
Xinyu

On Thu, Oct 10, 2019 at 6:40 AM Alexey Romanenko <[email protected]>
wrote:

> +1 for merging this new runner too (even if it’s not 100% ready for the
> moment) in case if it doesn’t break/fail/affect all other tests and Jenkins
> jobs. I mean, it should be transparent for other Beam components.
>
> Also, since it won’t be officially “released” right after merging, we need
> to clearly warn users that it’s not ready to use in production.
>
> > On 10 Oct 2019, at 15:25, Ryan Skraba <[email protected]> wrote:
> >
> > Merging to master sounds like a really good idea, even if it is not
> > feature-complete yet.
> >
> > It's already a pretty big accomplishment getting it to the current
> > state (great job all!).  Merging it into master would give it a pretty
> > good boost for visibility and encouraging some discussion about where
> > it's going.
> >
> > I don't think there's any question about removing the RDD-based
> > (a.k.a. old/legacy/stable) spark runner yet!
> >
> > All my best, Ryan
> >
> >
> > On Thu, Oct 10, 2019 at 2:47 PM Jean-Baptiste Onofré <[email protected]>
> wrote:
> >>
> >> +1
> >>
> >> As the runner seems almost "equivalent" to the one we have, it makes
> sense.
> >>
> >> Question is: do we keep the "old" spark runner for a while or not (or
> >> just keep on previous version/tag on git) ?
> >>
> >> Regards
> >> JB
> >>
> >> On 10/10/2019 09:39, Etienne Chauchot wrote:
> >>> Hi guys,
> >>>
> >>> You probably know that there has been for several months an work
> >>> developing a new Spark runner based on Spark Structured Streaming
> >>> framework. This work is located in a feature branch here:
> >>> https://github.com/apache/beam/tree/spark-runner_structured-streaming
> >>>
> >>> To attract more contributors and get some user feedback, we think it is
> >>> time to merge it to master. Before doing so, some steps need to be
> >>> achieved:
> >>>
> >>> - finish the work on spark Encoders (that allow to call Beam coders)
> >>> because, right now, the runner is in an unstable state (some transforms
> >>> use the new way of doing ser/de and some use the old one, making a
> >>> pipeline incoherent toward serialization)
> >>>
> >>> - clean history: The history contains commits from November 2018, so
> >>> there is a good amount of work, thus a consequent number of commits.
> >>> They were already squashed but not from September 2019
> >>>
> >>> Regarding status:
> >>>
> >>> - the runner passes 89% of the validates runner tests in batch mode. We
> >>> hope to pass more with the new Encoders
> >>>
> >>> - Streaming mode is barely started (waiting for the multi-aggregations
> >>> support in spark SS framework from the Spark community)
> >>>
> >>> - Runner can execute Nexmark
> >>>
> >>> - Some things are not wired up yet
> >>>
> >>>    - Beam Schemas not wired with Spark Schemas
> >>>
> >>>    - Optional features of the model not implemented:  state api, timer
> >>> api, splittable doFn api, …
> >>>
> >>> WDYT, can we merge it to master once the 2 steps are done ?
> >>>
> >>> Best
> >>>
> >>> Etienne
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> [email protected]
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
>
>

Re: [spark structured streaming runner] merge to master?

Reply via email to