Hi guys,
I wanted to let you know that I have just submitted a PR around NexMark.
This is a port of the NexMark queries to Beam, to be used as integration
tests.
This can also be used as A-B testing (no-regression or performance
comparison between 2 versions of the same engine or of the same runner)
This a continuation of the previous PR (#99) from Mark Shields.
The code has changed quite a bit: some queries have changed to use new
Beam APIs and there where some big refactorings. More important, we can
now run all the queries in all the runners.
Nevertheless, there are still some open issues in Nexmark
(https://github.com/iemejia/beam/issues) and in Beam upstream (see issue
links in https://issues.apache.org/jira/browse/BEAM-160)
I wanted to submit the PR before our (Ismaël and I) NexMark talk at the
ApacheCon. The PR is not perfect but it is in a good shape to share it.
Best,
Etienne
Le 22/03/2017 à 04:51, Kenneth Knowles a écrit :
This is great! Having a variety of realistic-ish pipelines running on all
runners complements the validation suite and IO IT work.
If I recall, some of these involve heavy and esoteric uses of state, so
definitely give me a ping if you hit any trouble.
Kenn
On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <echauc...@gmail.com>
wrote:
Hi all,
Ismael and I are working on upgrading the Nexmark implementation for Beam.
See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and
https://issues.apache.org/jira/browse/BEAM-160. We are continuing the
work done by Mark Shields. See https://github.com/apache/beam/pull/366
for the original PR.
The PR contains queries that have a wide coverage of the Beam model and
that represent a realistic end user use case (some come from client
experience on Google Cloud Dataflow).
So far, we have upgraded the implementation to the latest Beam snapshot.
And we are able to execute a good subset of the queries in the different
runners. We upgraded the nexmark drivers to do so: direct driver (upgraded
from inProcessDriver) and flink driver and we added a new one for spark.
There is still a good amount of work to do and we would like to know if
you think that this contribution can have its place into Beam eventually.
The interests of having Nexmark on Beam that we have seen so far are:
- Rich batch/streaming test
- A-B testing of runners or runtimes (non-regression, performance
comparison between versions ...)
- Integration testing (sdk/runners, runner/runtime, ...)
- Validate beam capability matrix
- It can be used as part of the ongoing PerfKit work (if there is any
interest).
As a final note, we are tracking the issues in the same repo. If someone
is interested in contributing, or have more ideas, you are welcome :)
Etienne