Hi, I would love to join the call. Can you also share the meeting invitation with me?
Thanks, Rahul On Wed 18 Sep, 2019, 11:48 PM Xinyu Liu, <[email protected]> wrote: > Alexey and Etienne: I'm very happy to join the sync-up meeting. Please > forward the meeting info to me. I am based in California, US and hopefully > the time will work :). > > Thanks, > Xinyu > > On Wed, Sep 18, 2019 at 6:39 AM Etienne Chauchot <[email protected]> > wrote: > >> Hi Xinyu, >> >> Thanks for offering help ! My comments are inline: >> >> Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit : >> >> Hi, Etienne, >> >> The slides are very informative! Thanks for sharing the details about how >> the Beam API are mapped into Spark Structural Streaming. >> >> >> Thanks ! >> >> We (LinkedIn) are also interested in trying the new SparkRunner to run >> Beam pipeine in batch, and contribute to it too. From my understanding, >> seems the functionality on batch side is mostly complete and covers quite a >> large percentage of the tests (a few missing pieces like state and timer in >> ParDo and SDF). >> >> >> Correct, it passes 89% of the tests, but there is more than SDF, state >> and timer missing, there is also ongoing encoders work that I would like to >> commit/push before merging. >> >> If so, is it possible to merge the new runner sooner into master so it's >> much easier for us to pull it in (we have an internal fork) and contribute >> back? >> >> >> Sure, see my other mail on this thread. As Alexey mentioned, please join >> the sync meeting we have, the more the merrier ! >> >> >> Also curious about the scheme part in the runner. Seems we can leverage >> the schema-aware work in PCollection and translate from Beam schema to >> Spark, so it can be optimized in the planner layer. It will be great to >> hear back your plans on that. >> >> >> Well, it is not designed yet but, if you remember my talk, we need to >> store beam windowing information with the data itself, so ending up having >> a dataset<WindowedValue> . One lead that was discussed is to store it as a >> Spark schema such as this: >> >> 1. field1: binary data for beam windowing information (cannot be mapped >> to fields because beam windowing info is complex structure) >> >> 2. fields of data as defined in the Beam schema if there is one >> >> >> Congrats on this great work! >> >> Thanks ! >> >> Best, >> >> Etienne >> >> Thanks, >> Xinyu >> >> On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <[email protected]> wrote: >> >> Hello Etienne, >> >> Your slide mentioned that streaming mode development is blocked because >> Spark lacks supporting multiple-aggregations in its streaming mode but >> design is ongoing. Do you have a link or something else to their design >> discussion/doc? >> >> >> -Rui >> >> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <[email protected]> >> wrote: >> >> Hi Rahul, >> Sure, and great ! Thanks for proposing ! >> If you want details, here is the presentation I did 30 mins ago at the >> apachecon. You will find the video on youtube shortly but in the meantime, >> here is my presentation slides. >> >> And here is the structured streaming branch. I'll be happy to review your >> PRs, thanks ! >> >> <https://github.com/apache/beam/tree/spark-runner_structured-streaming> >> https://github.com/apache/beam/tree/spark-runner_structured-streaming >> >> Best >> Etienne >> >> Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit : >> >> Hi Etienne, >> >> I came to know about the work going on in Structured Streaming Spark >> Runner from Apache Beam Wiki - Works in Progress. >> I have contributed to BeamSql earlier. And I am working on supporting >> PCollectionView in BeamSql. >> >> I would love to understand the Runner's side of Apache Beam and >> contribute to the Structured Streaming Spark Runner. >> >> Can you please point me in the right direction? >> >> Thanks, >> Rahul >> >>
