Re: [spark structured streaming runner] merge to master?

Ryan Skraba Thu, 10 Oct 2019 06:26:32 -0700

Merging to master sounds like a really good idea, even if it is not
feature-complete yet.


It's already a pretty big accomplishment getting it to the current
state (great job all!).  Merging it into master would give it a pretty
good boost for visibility and encouraging some discussion about where
it's going.

I don't think there's any question about removing the RDD-based
(a.k.a. old/legacy/stable) spark runner yet!

All my best, Ryan


On Thu, Oct 10, 2019 at 2:47 PM Jean-Baptiste Onofré <[email protected]> wrote:
>
> +1
>
> As the runner seems almost "equivalent" to the one we have, it makes sense.
>
> Question is: do we keep the "old" spark runner for a while or not (or
> just keep on previous version/tag on git) ?
>
> Regards
> JB
>
> On 10/10/2019 09:39, Etienne Chauchot wrote:
> > Hi guys,
> >
> > You probably know that there has been for several months an work
> > developing a new Spark runner based on Spark Structured Streaming
> > framework. This work is located in a feature branch here:
> > https://github.com/apache/beam/tree/spark-runner_structured-streaming
> >
> > To attract more contributors and get some user feedback, we think it is
> > time to merge it to master. Before doing so, some steps need to be
> > achieved:
> >
> > - finish the work on spark Encoders (that allow to call Beam coders)
> > because, right now, the runner is in an unstable state (some transforms
> > use the new way of doing ser/de and some use the old one, making a
> > pipeline incoherent toward serialization)
> >
> > - clean history: The history contains commits from November 2018, so
> > there is a good amount of work, thus a consequent number of commits.
> > They were already squashed but not from September 2019
> >
> > Regarding status:
> >
> > - the runner passes 89% of the validates runner tests in batch mode. We
> > hope to pass more with the new Encoders
> >
> > - Streaming mode is barely started (waiting for the multi-aggregations
> > support in spark SS framework from the Spark community)
> >
> > - Runner can execute Nexmark
> >
> > - Some things are not wired up yet
> >
> >     - Beam Schemas not wired with Spark Schemas
> >
> >     - Optional features of the model not implemented:  state api, timer
> > api, splittable doFn api, …
> >
> > WDYT, can we merge it to master once the 2 steps are done ?
> >
> > Best
> >
> > Etienne
> >
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: [spark structured streaming runner] merge to master?

Reply via email to