I think these are valuable enough that we should get them into apache/master

On Fri, May 12, 2017 at 4:34 AM, Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi,
>
> PR or even a feature branch could work. Up to you.
>
> Regards
> JB
>
>
> On 05/12/2017 10:55 AM, Etienne Chauchot wrote:
>
>> Hi guys,
>>
>> I wanted to let you know that I have just submitted a PR around NexMark.
>> This is
>> a port of the NexMark queries to Beam, to be used as integration tests.
>> This can also be used as A-B testing (no-regression or performance
>> comparison
>> between 2 versions of the same engine or of the same runner)
>>
>> This a continuation of the previous PR (#99) from Mark Shields.
>> The code has changed quite a bit: some queries have changed to use new
>> Beam APIs
>> and there where some big refactorings. More important, we can now run all
>> the
>> queries in all the runners.
>>
>> Nevertheless, there are still some open issues in Nexmark
>> (https://github.com/iemejia/beam/issues) and in Beam upstream (see issue
>> links
>> in https://issues.apache.org/jira/browse/BEAM-160)
>>
>> I wanted to submit the PR before our (Ismaël and I) NexMark talk at the
>> ApacheCon. The PR is not perfect but it is in a good shape to share it.
>>
>> Best,
>>
>> Etienne
>>
>>
>>
>> Le 22/03/2017 à 04:51, Kenneth Knowles a écrit :
>>
>>> This is great! Having a variety of realistic-ish pipelines running on all
>>> runners complements the validation suite and IO IT work.
>>>
>>> If I recall, some of these involve heavy and esoteric uses of state, so
>>> definitely give me a ping if you hit any trouble.
>>>
>>> Kenn
>>>
>>> On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot <[email protected]>
>>> wrote:
>>>
>>> Hi all,
>>>>
>>>> Ismael and I are working on upgrading the Nexmark implementation for
>>>> Beam.
>>>> See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and
>>>> https://issues.apache.org/jira/browse/BEAM-160. We are continuing the
>>>> work done by Mark Shields. See https://github.com/apache/beam/pull/366
>>>> for the original PR.
>>>>
>>>> The PR contains queries that have a wide coverage of the Beam model and
>>>> that represent a realistic end user use case (some come from client
>>>> experience on Google Cloud Dataflow).
>>>>
>>>> So far, we have upgraded the implementation to the latest Beam snapshot.
>>>> And we are able to execute a good subset of the queries in the different
>>>> runners. We upgraded the nexmark drivers to do so: direct driver
>>>> (upgraded
>>>> from inProcessDriver) and flink driver and we added a new one for spark.
>>>>
>>>> There is still a good amount of work to do and we would like to know if
>>>> you think that this contribution can have its place into Beam
>>>> eventually.
>>>>
>>>> The interests of having Nexmark on Beam that we have seen so far are:
>>>>
>>>> - Rich batch/streaming test
>>>>
>>>> - A-B testing of runners or runtimes (non-regression, performance
>>>> comparison between versions ...)
>>>>
>>>> - Integration testing (sdk/runners, runner/runtime, ...)
>>>>
>>>> - Validate beam capability matrix
>>>>
>>>> - It can be used as part of the ongoing PerfKit work (if there is any
>>>> interest).
>>>>
>>>> As a final note, we are tracking the issues in the same repo. If someone
>>>> is interested in contributing, or have more ideas, you are welcome :)
>>>>
>>>> Etienne
>>>>
>>>>
>>>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to