25/09 looks ok. I just updated the meeting invitation to the new date.I will prepare a mini agenda in the shared minute document in the meantime. I cannot see the old invitees, can someone please confirm me they see the date updated. Thanks, Ismaël
On Thu, Sep 19, 2019 at 2:13 PM Etienne Chauchot <[email protected]> wrote: > > Hi Rahul and Xinyu, > I just added you to the list of guests in the meeting. Time is 5pm GMT +2. > That being said, for some reason last meeting scheduled was 08/28. Ismael > initially created the meeting, I do not have the rights to add a new date. > Ismael can you add a date ? I suggest 09/25. WDYT ? > > Best > Etienne > > Le jeudi 19 septembre 2019 à 00:49 +0530, rahul patwari a écrit : > > Hi, > > I would love to join the call. > Can you also share the meeting invitation with me? > > Thanks, > Rahul > > On Wed 18 Sep, 2019, 11:48 PM Xinyu Liu, <[email protected]> wrote: > > Alexey and Etienne: I'm very happy to join the sync-up meeting. Please > forward the meeting info to me. I am based in California, US and hopefully > the time will work :). > > Thanks, > Xinyu > > On Wed, Sep 18, 2019 at 6:39 AM Etienne Chauchot <[email protected]> wrote: > > Hi Xinyu, > > Thanks for offering help ! My comments are inline: > > Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit : > > Hi, Etienne, > > The slides are very informative! Thanks for sharing the details about how the > Beam API are mapped into Spark Structural Streaming. > > > Thanks ! > > We (LinkedIn) are also interested in trying the new SparkRunner to run Beam > pipeine in batch, and contribute to it too. From my understanding, seems the > functionality on batch side is mostly complete and covers quite a large > percentage of the tests (a few missing pieces like state and timer in ParDo > and SDF). > > > Correct, it passes 89% of the tests, but there is more than SDF, state and > timer missing, there is also ongoing encoders work that I would like to > commit/push before merging. > > If so, is it possible to merge the new runner sooner into master so it's much > easier for us to pull it in (we have an internal fork) and contribute back? > > > Sure, see my other mail on this thread. As Alexey mentioned, please join the > sync meeting we have, the more the merrier ! > > > Also curious about the scheme part in the runner. Seems we can leverage the > schema-aware work in PCollection and translate from Beam schema to Spark, so > it can be optimized in the planner layer. It will be great to hear back your > plans on that. > > > Well, it is not designed yet but, if you remember my talk, we need to store > beam windowing information with the data itself, so ending up having a > dataset<WindowedValue> . One lead that was discussed is to store it as a > Spark schema such as this: > > 1. field1: binary data for beam windowing information (cannot be mapped to > fields because beam windowing info is complex structure) > > 2. fields of data as defined in the Beam schema if there is one > > > Congrats on this great work! > > Thanks ! > > Best, > > Etienne > > Thanks, > Xinyu > > On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <[email protected]> wrote: > > Hello Etienne, > > Your slide mentioned that streaming mode development is blocked because Spark > lacks supporting multiple-aggregations in its streaming mode but design is > ongoing. Do you have a link or something else to their design discussion/doc? > > > -Rui > > On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <[email protected]> wrote: > > Hi Rahul, > Sure, and great ! Thanks for proposing ! > If you want details, here is the presentation I did 30 mins ago at the > apachecon. You will find the video on youtube shortly but in the meantime, > here is my presentation slides. > > And here is the structured streaming branch. I'll be happy to review your > PRs, thanks ! > > https://github.com/apache/beam/tree/spark-runner_structured-streaming > > Best > Etienne > > Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit : > > Hi Etienne, > > I came to know about the work going on in Structured Streaming Spark Runner > from Apache Beam Wiki - Works in Progress. > I have contributed to BeamSql earlier. And I am working on supporting > PCollectionView in BeamSql. > > I would love to understand the Runner's side of Apache Beam and contribute to > the Structured Streaming Spark Runner. > > Can you please point me in the right direction? > > Thanks, > Rahul
