Re: [spark runner based on dataset POC] your opinion

Kenneth Knowles Wed, 16 Jan 2019 19:05:39 -0800

Cool!

I don't quite understand the issue in "bytes serialization to comply to
spark dataset schemas to store windowedValues". Can you say a little more?


Kenn

On Tue, Jan 15, 2019 at 8:54 AM Etienne Chauchot <[email protected]>
wrote:

> Hi guys,
> regarding the new (made from scratch) spark runner POC based on the
> dataset API, I was able to make a big step forward: it can now run a first
> batch pipeline with a source !
>
> See
> https://github.com/apache/beam/blob/spark-runner_structured-streaming/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/SourceTest.java
>
> there is no test facilities for now, testmode is enabled and it just
> prints the output PCollection .
>
> I made some workarounds especially String serialization to pass beam
> objects (was forced to) and also bytes serialization to comply to spark
> dataset schemas to store windowedValues.
>
> Can you give me your thoughts especially regarding these last 2 matters?
>
> The other parts are not ready for showing yet
>
> Here is the whole branch:
>
>
> https://github.com/apache/beam/blob/spark-runner_structured-streaming/runners/spark-structured-streaming
>
> Thanks,
>
> Etienne
>
>

Re: [spark runner based on dataset POC] your opinion

Reply via email to