Re: Pointers on Contributing to Structured Streaming Spark Runner

rahul patwari Wed, 18 Sep 2019 12:19:41 -0700

Hi,

I would love to join the call.
Can you also share the meeting invitation with me?


Thanks,
Rahul

On Wed 18 Sep, 2019, 11:48 PM Xinyu Liu, <[email protected]> wrote:

> Alexey and Etienne: I'm very happy to join the sync-up meeting. Please
> forward the meeting info to me. I am based in California, US and hopefully
> the time will work :).
>
> Thanks,
> Xinyu
>
> On Wed, Sep 18, 2019 at 6:39 AM Etienne Chauchot <[email protected]>
> wrote:
>
>> Hi Xinyu,
>>
>> Thanks for offering help ! My comments are inline:
>>
>> Le vendredi 13 septembre 2019 à 12:16 -0700, Xinyu Liu a écrit :
>>
>> Hi, Etienne,
>>
>> The slides are very informative! Thanks for sharing the details about how
>> the Beam API are mapped into Spark Structural Streaming.
>>
>>
>> Thanks !
>>
>> We (LinkedIn) are also interested in trying the new SparkRunner to run
>> Beam pipeine in batch, and contribute to it too. From my understanding,
>> seems the functionality on batch side is mostly complete and covers quite a
>> large percentage of the tests (a few missing pieces like state and timer in
>> ParDo and SDF).
>>
>>
>> Correct, it passes 89% of the tests, but there is more than SDF, state
>> and timer missing, there is also ongoing encoders work that I would like to
>> commit/push before merging.
>>
>> If so, is it possible to merge the new runner sooner into master so it's
>> much easier for us to pull it in (we have an internal fork) and contribute
>> back?
>>
>>
>> Sure, see my other mail on this thread. As Alexey mentioned, please join
>> the sync meeting we have, the more the merrier !
>>
>>
>> Also curious about the scheme part in the runner. Seems we can leverage
>> the schema-aware work in PCollection and translate from Beam schema to
>> Spark, so it can be optimized in the planner layer. It will be great to
>> hear back your plans on that.
>>
>>
>> Well, it is not designed yet but, if you remember my talk, we need to
>> store beam windowing information with the data itself, so ending up having
>> a dataset<WindowedValue> . One lead that was discussed is to store it as a
>> Spark schema such as this:
>>
>> 1. field1: binary data for beam windowing information (cannot be mapped
>> to fields because beam windowing info is complex structure)
>>
>> 2. fields of data as defined in the Beam schema if there is one
>>
>>
>> Congrats on this great work!
>>
>> Thanks !
>>
>> Best,
>>
>> Etienne
>>
>> Thanks,
>> Xinyu
>>
>> On Wed, Sep 11, 2019 at 6:02 PM Rui Wang <[email protected]> wrote:
>>
>> Hello Etienne,
>>
>> Your slide mentioned that streaming mode development is blocked because
>> Spark lacks supporting multiple-aggregations in its streaming mode but
>> design is ongoing. Do you have a link or something else to their design
>> discussion/doc?
>>
>>
>> -Rui
>>
>> On Wed, Sep 11, 2019 at 5:10 PM Etienne Chauchot <[email protected]>
>> wrote:
>>
>> Hi Rahul,
>> Sure, and great ! Thanks for proposing !
>> If you want details, here is the presentation I did 30 mins ago at the
>> apachecon. You will find the video on youtube shortly but in the meantime,
>> here is my presentation slides.
>>
>> And here is the structured streaming branch. I'll be happy to review your
>> PRs, thanks !
>>
>> <https://github.com/apache/beam/tree/spark-runner_structured-streaming>
>> https://github.com/apache/beam/tree/spark-runner_structured-streaming
>>
>> Best
>> Etienne
>>
>> Le mercredi 11 septembre 2019 à 16:37 +0530, rahul patwari a écrit :
>>
>> Hi Etienne,
>>
>> I came to know about the work going on in Structured Streaming Spark
>> Runner from Apache Beam Wiki - Works in Progress.
>> I have contributed to BeamSql earlier. And I am working on supporting
>> PCollectionView in BeamSql.
>>
>> I would love to understand the Runner's side of Apache Beam and
>> contribute to the Structured Streaming Spark Runner.
>>
>> Can you please point me in the right direction?
>>
>> Thanks,
>> Rahul
>>
>>

Re: Pointers on Contributing to Structured Streaming Spark Runner

Reply via email to