Cool! I don't quite understand the issue in "bytes serialization to comply to spark dataset schemas to store windowedValues". Can you say a little more?
Kenn On Tue, Jan 15, 2019 at 8:54 AM Etienne Chauchot <[email protected]> wrote: > Hi guys, > regarding the new (made from scratch) spark runner POC based on the > dataset API, I was able to make a big step forward: it can now run a first > batch pipeline with a source ! > > See > https://github.com/apache/beam/blob/spark-runner_structured-streaming/runners/spark-structured-streaming/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/SourceTest.java > > there is no test facilities for now, testmode is enabled and it just > prints the output PCollection . > > I made some workarounds especially String serialization to pass beam > objects (was forced to) and also bytes serialization to comply to spark > dataset schemas to store windowedValues. > > Can you give me your thoughts especially regarding these last 2 matters? > > The other parts are not ready for showing yet > > Here is the whole branch: > > > https://github.com/apache/beam/blob/spark-runner_structured-streaming/runners/spark-structured-streaming > > Thanks, > > Etienne > >
