Re: Pointers on Contributing to Structured Streaming Spark Runner

Ismaël Mejía Thu, 19 Sep 2019 08:20:03 -0700

25/09 looks ok. I just updated the meeting invitation to the new
date.I will prepare a mini agenda in the shared minute document in the
meantime.
I cannot see the old invitees, can someone please confirm me they see
the date updated.
Thanks,
Ismaël


On Thu, Sep 19, 2019 at 2:13 PM Etienne Chauchot <[email protected]> wrote:
>
> Hi Rahul and Xinyu,
> I just added you to the list of guests in the meeting. Time is 5pm GMT +2.
> That being said, for some reason last meeting scheduled was 08/28. Ismael 
> initially created the meeting, I do not have the rights to add a new date. 
> Ismael can you add a date ? I suggest 09/25. WDYT ?
>
> Best
> Etienne
>
> Le jeudi 19 septembre 2019 à 00:49 +0530, rahul patwari a écrit :
>
> Hi,
>
> I would love to join the call.
> Can you also share the meeting invitation with me?
>
> Thanks,
> Rahul
>
> On Wed 18 Sep, 2019, 11:48 PM Xinyu Liu, <[email protected]> wrote:
>
> Alexey and Etienne: I'm very happy to join the sync-up meeting. Please 
> forward the meeting info to me. I am based in California, US and hopefully 
> the time will work :).
>
> Thanks,
> Xinyu
>
> On Wed, Sep 18, 2019 at 6:39 AM Etienne Chauchot <[email protected]> wrote:
>
> Hi Xinyu,
>
> Thanks for offering help ! My comments are inline:
>
> Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit :
>
> Hi, Etienne,
>
> The slides are very informative! Thanks for sharing the details about how the 
> Beam API are mapped into Spark Structural Streaming.
>
>
> Thanks !
>
> We (LinkedIn) are also interested in trying the new SparkRunner to run Beam 
> pipeine in batch, and contribute to it too. From my understanding, seems the 
> functionality on batch side is mostly complete and covers quite a large 
> percentage of the tests (a few missing pieces like state and timer in ParDo 
> and SDF).
>
>
> Correct, it passes 89% of the tests, but there is more than SDF, state and 
> timer missing, there is also ongoing encoders work that I would like to 
> commit/push before merging.
>
> If so, is it possible to merge the new runner sooner into master so it's much 
> easier for us to pull it in (we have an internal fork) and contribute back?
>
>
> Sure, see my other mail on this thread. As Alexey mentioned, please join the 
> sync meeting we have, the more the merrier !
>
>
> Also curious about the scheme part in the runner. Seems we can leverage the 
> schema-aware work in PCollection and translate from Beam schema to Spark, so 
> it can be optimized in the planner layer. It will be great to hear back your 
> plans on that.
>
>
> Well, it is not designed yet but, if you remember my talk, we need to store 
> beam windowing information with the data itself, so ending up having a 
> dataset<WindowedValue> . One lead that was discussed is to store it as a 
> Spark schema such as this:
>
> 1. field1: binary data for beam windowing information (cannot be mapped to 
> fields because beam windowing info is complex structure)
>
> 2. fields of data as defined in the Beam schema if there is one
>
>
> Congrats on this great work!
>
> Thanks !
>
> Best,
>
> Etienne
>
> Thanks,
> Xinyu
>
> On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <[email protected]> wrote:
>
> Hello Etienne,
>
> Your slide mentioned that streaming mode development is blocked because Spark 
> lacks supporting multiple-aggregations in its streaming mode but design is 
> ongoing. Do you have a link or something else to their design discussion/doc?
>
>
> -Rui
>
> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <[email protected]> wrote:
>
> Hi Rahul,
> Sure, and great ! Thanks for proposing !
> If you want details, here is the presentation I did 30 mins ago at the 
> apachecon. You will find the video on youtube shortly but in the meantime, 
> here is my presentation slides.
>
> And here is the structured streaming branch. I'll be happy to review your 
> PRs, thanks !
>
> https://github.com/apache/beam/tree/spark-runner_structured-streaming
>
> Best
> Etienne
>
> Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit :
>
> Hi Etienne,
>
> I came to know about the work going on in Structured Streaming Spark Runner 
> from Apache Beam Wiki - Works in Progress.
> I have contributed to BeamSql earlier. And I am working on supporting 
> PCollectionView in BeamSql.
>
> I would love to understand the Runner's side of Apache Beam and contribute to 
> the Structured Streaming Spark Runner.
>
> Can you please point me in the right direction?
>
> Thanks,
> Rahul

Re: Pointers on Contributing to Structured Streaming Spark Runner

Reply via email to