[DISCUSSION] using NexMark for Beam

Etienne Chauchot Tue, 21 Mar 2017 09:39:08 -0700

Hi all,

Ismael and I are working on upgrading the Nexmark implementation forBeam. See https://github.com/iemejia/beam/tree/BEAM-160-nexmark andhttps://issues.apache.org/jira/browse/BEAM-160. We are continuing thework done by Mark Shields. See https://github.com/apache/beam/pull/366for the original PR.

The PR contains queries that have a wide coverage of the Beam model andthat represent a realistic end user use case (some come from clientexperience on Google Cloud Dataflow).

So far, we have upgraded the implementation to the latest Beam snapshot.And we are able to execute a good subset of the queries in the differentrunners. We upgraded the nexmark drivers to do so: direct driver(upgraded from inProcessDriver) and flink driver and we added a new onefor spark.

There is still a good amount of work to do and we would like to know ifyou think that this contribution can have its place into Beam eventually.


The interests of having Nexmark on Beam that we have seen so far are:

- Rich batch/streaming test

- A-B testing of runners or runtimes (non-regression, performancecomparison between versions ...)


- Integration testing (sdk/runners, runner/runtime, ...)

- Validate beam capability matrix

- It can be used as part of the ongoing PerfKit work (if there is anyinterest).

As a final note, we are tracking the issues in the same repo. If someoneis interested in contributing, or have more ideas, you are welcome :)


Etienne

[DISCUSSION] using NexMark for Beam

Reply via email to