+1

As the runner seems almost "equivalent" to the one we have, it makes sense.

Question is: do we keep the "old" spark runner for a while or not (or
just keep on previous version/tag on git) ?

Regards
JB

On 10/10/2019 09:39, Etienne Chauchot wrote:
> Hi guys,
> 
> You probably know that there has been for several months an work
> developing a new Spark runner based on Spark Structured Streaming
> framework. This work is located in a feature branch here:
> https://github.com/apache/beam/tree/spark-runner_structured-streaming
> 
> To attract more contributors and get some user feedback, we think it is
> time to merge it to master. Before doing so, some steps need to be
> achieved:
> 
> - finish the work on spark Encoders (that allow to call Beam coders)
> because, right now, the runner is in an unstable state (some transforms
> use the new way of doing ser/de and some use the old one, making a
> pipeline incoherent toward serialization)
> 
> - clean history: The history contains commits from November 2018, so
> there is a good amount of work, thus a consequent number of commits.
> They were already squashed but not from September 2019
> 
> Regarding status:
> 
> - the runner passes 89% of the validates runner tests in batch mode. We
> hope to pass more with the new Encoders
> 
> - Streaming mode is barely started (waiting for the multi-aggregations
> support in spark SS framework from the Spark community)
> 
> - Runner can execute Nexmark
> 
> - Some things are not wired up yet
> 
>     - Beam Schemas not wired with Spark Schemas
> 
>     - Optional features of the model not implemented:  state api, timer
> api, splittable doFn api, …
> 
> WDYT, can we merge it to master once the 2 steps are done ?
> 
> Best
> 
> Etienne
> 

-- 
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to