I think these are valuable enough that we should get them into apache/master
On Fri, May 12, 2017 at 4:34 AM, Jean-Baptiste Onofré <[email protected]> wrote: > Hi, > > PR or even a feature branch could work. Up to you. > > Regards > JB > > > On 05/12/2017 10:55 AM, Etienne Chauchot wrote: > >> Hi guys, >> >> I wanted to let you know that I have just submitted a PR around NexMark. >> This is >> a port of the NexMark queries to Beam, to be used as integration tests. >> This can also be used as A-B testing (no-regression or performance >> comparison >> between 2 versions of the same engine or of the same runner) >> >> This a continuation of the previous PR (#99) from Mark Shields. >> The code has changed quite a bit: some queries have changed to use new >> Beam APIs >> and there where some big refactorings. More important, we can now run all >> the >> queries in all the runners. >> >> Nevertheless, there are still some open issues in Nexmark >> (https://github.com/iemejia/beam/issues) and in Beam upstream (see issue >> links >> in https://issues.apache.org/jira/browse/BEAM-160) >> >> I wanted to submit the PR before our (Ismaël and I) NexMark talk at the >> ApacheCon. The PR is not perfect but it is in a good shape to share it. >> >> Best, >> >> Etienne >> >> >> >> Le 22/03/2017 à 04:51, Kenneth Knowles a écrit : >> >>> This is great! Having a variety of realistic-ish pipelines running on all >>> runners complements the validation suite and IO IT work. >>> >>> If I recall, some of these involve heavy and esoteric uses of state, so >>> definitely give me a ping if you hit any trouble. >>> >>> Kenn >>> >>> On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <[email protected]> >>> wrote: >>> >>> Hi all, >>>> >>>> Ismael and I are working on upgrading the Nexmark implementation for >>>> Beam. >>>> See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and >>>> https://issues.apache.org/jira/browse/BEAM-160. We are continuing the >>>> work done by Mark Shields. See https://github.com/apache/beam/pull/366 >>>> for the original PR. >>>> >>>> The PR contains queries that have a wide coverage of the Beam model and >>>> that represent a realistic end user use case (some come from client >>>> experience on Google Cloud Dataflow). >>>> >>>> So far, we have upgraded the implementation to the latest Beam snapshot. >>>> And we are able to execute a good subset of the queries in the different >>>> runners. We upgraded the nexmark drivers to do so: direct driver >>>> (upgraded >>>> from inProcessDriver) and flink driver and we added a new one for spark. >>>> >>>> There is still a good amount of work to do and we would like to know if >>>> you think that this contribution can have its place into Beam >>>> eventually. >>>> >>>> The interests of having Nexmark on Beam that we have seen so far are: >>>> >>>> - Rich batch/streaming test >>>> >>>> - A-B testing of runners or runtimes (non-regression, performance >>>> comparison between versions ...) >>>> >>>> - Integration testing (sdk/runners, runner/runtime, ...) >>>> >>>> - Validate beam capability matrix >>>> >>>> - It can be used as part of the ongoing PerfKit work (if there is any >>>> interest). >>>> >>>> As a final note, we are tracking the issues in the same repo. If someone >>>> is interested in contributing, or have more ideas, you are welcome :) >>>> >>>> Etienne >>>> >>>> >>>> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com >
