Forgot to say thanks everyone for their contribution to this especially Alexey, Ryan and Ismael.

Etienne

On 20/11/2019 17:12, Etienne Chauchot wrote:
Hi all,

I'm glad to announce that the new Spark runner based on Spark structured streaming framework has been merged into master !

It is not based on RDD/DStream API. See https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

It is still experimental, its coverage of the Beam model is partial:

- the runner passes 95% of the validates runner tests in batch mode.

- It does not have support for streaming yet (waiting for the multi-aggregations support in spark StructuredStreaming framework from the Spark community)

- Runner can execute Nexmark : perfkit dashboards yet to come

- Some things are not wired up yet:

    - Beam Schemas not wired up

    - Optional features of the model not implemented:  state api, timer api, splittable doFn api, …

I will submit a PR to update the capability matrix in the coming days.

Best

Etienne


Reply via email to