Hi Rui,Thanks for proposing to contribute to this new runner !
Here are the pointers:- SS runner branch:
https://github.com/apache/beam/tree/spark-runner_structured-streaming- spark
design doc for multiple watermarks support:
https://docs.google.com/document/d/1IAH9UQJPUiUCLd7H6dazRK2k1szDX38SnM6GVNZYvUo/edit#t#
. There is also a good
discussion in this Spark PR branch : https://github.com/apache/spark/pull/23576
As Alexey mentioned in this thread, the SS runner feature branch will be merged
into master when the runner is in good
shape. I think we will not wait for the streaming part as it requires a deep
change in the spark core + impl of the
streaming part of the Beam runner, so it would take too long. IMHO we need to
get batch mode of the new runner in a
stable state (encoders ongoing work, fix bad perf of the 2 nexmark queries,
...) before merging.
Best,Etienne
Le mercredi 11 septembre 2019 à 18:02 -0700, Rui Wang a écrit :
> Hello Etienne,
> Your slide mentioned that streaming mode development is blocked because Spark
> lacks supporting multiple-aggregations
> in its streaming mode but design is ongoing. Do you have a link or something
> else to their design discussion/doc?
>
>
> -Rui
> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <[email protected]> wrote:
> > Hi Rahul,Sure, and great ! Thanks for proposing !If you want details, here
> > is the presentation I did 30 mins ago at
> > the apachecon. You will find the video on youtube shortly but in the
> > meantime, here is my presentation slides.
> > And here is the structured streaming branch. I'll be happy to review your
> > PRs, thanks !
> > https://github.com/apache/beam/tree/spark-runner_structured-streaming
> > BestEtienne
> > Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit :
> > > Hi Etienne,
> > >
> > > I came to know about the work going on in Structured Streaming Spark
> > > Runner from Apache Beam Wiki - Works in
> > > Progress.
> > > I have contributed to BeamSql earlier. And I am working on supporting
> > > PCollectionView in BeamSql.
> > >
> > > I would love to understand the Runner's side of Apache Beam and
> > > contribute to the Structured Streaming Spark
> > > Runner.
> > >
> > > Can you please point me in the right direction?
> > >
> > > Thanks,
> > > Rahul