Re: [PROPOSAL] Test performance of basic Apache Beam operations

Pablo Estrada Fri, 15 Mar 2019 15:39:02 -0700

I really like these. Happy to have them.
Best
-P.

On Fri, Mar 15, 2019 at 11:16 AM Łukasz Gajowy <[email protected]> wrote:


> Hi Beamers,
>
> an update on this. Together With Kasia, Michał and cooperating closely
> with Pablo we have created and scheduled a Cron Job running daily 7 tests
> for GroupByKey batch scenarios. Description of the tests is in the proposal
> [1] and will be documented later. The dashboards for the tests:
>  - showing run times [2]
>  - showing total load size (bytes) [3]
>
> All the metrics are collected using Beam's Metrics API.
>
> Things we have on our horizon:
>  - the same set of tests for Java but in streaming mode
>  - similar jobs for Python SDK
>  - running similar suites on Flink runner
>
> We have also created a set of Dataproc bash scripts that can be used to
> set up a Flink cluster that supports portability [4]. It is ready to use
> and I've already successfully run the word count example using Python SDK
> on it. Hoping + aiming to run load tests on it soon. :)
>
> BTW/Last but not least: we also reused some code to collect metrics using
> Metrics API in TextIOIT too and are willing to do a similar change for
> other IOITs. Dashboards for TextIOIT: [5].
>
> Thanks,
> Łukasz
>
> [1] https://s.apache.org/load-test-basic-operations
> [2]
> https://apache-beam-testing.appspot.com/explore?dashboard=5643144871804928
> [3]
> https://apache-beam-testing.appspot.com/explore?dashboard=5701325169885184
> [4]
> https://github.com/apache/beam/blob/b1ed061fd0c1ed1da562089c939d55715907769d/.test-infra/dataproc/create_flink_cluster.sh
> [5]
> https://apache-beam-testing.appspot.com/explore?dashboard=5629522644828160
>
>
>
> śr., 12 wrz 2018 o 14:23 Etienne Chauchot <[email protected]>
> napisał(a):
>
>> Let me elaborate a bit my last sentence
>> Le mardi 11 septembre 2018 à 11:29 +0200, Etienne Chauchot a écrit :
>>
>> Hi Lukasz,
>>
>> Well, having low level byte[] based pure performance tests makes sense.
>> And having high level realistic model (Nexmark auction system) makes sense
>> also to avoid testing unrealistic pipelines as you describe.
>>
>> Have common code between the 2 seems difficult as both the architecture
>> and the model are different.
>>
>> I'm more concerned about having two CI mechanisms to detect
>> functionnal/performance regressions.
>>
>>
>> Even if parts of NexMark and performance tests are the same they could
>> target different objectives: raw performance tests (the new framework) and
>> user oriented tests (nexmark). So they might be complementary.
>>
>> We must just chose how to run them. I think we need to have only one
>> automatic regression detection tool. IMHO, the most relevant for func/perf
>> regression is Nexmark because it represents what a real user could do (it
>> simulates an auction system). So let's  keep it as post commits. Post
>> commits allow to target a particular commit that introduced a regression.
>>
>> We could schedule the new performance tests.
>>
>> Best
>> Etienne
>>
>>
>> Best
>> Etienne
>>
>> Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
>>
>> In my opinion and as far as I understand Nexmark, there are some benefits
>> to having both types of tests. The load tests we propose can be very
>> straightforward and clearly show what is being tested thanks to the fact
>> that there's no fixed model but very "low level" KV<byte[], byte[]>
>> collections only. They are more flexible in shapes of the pipelines they
>> can express e.g. fanout_64, without having to think about specific use
>> cases.
>>
>> Having both types would allow developers to decide whether they want to
>> create a new Nexmark query for their specific case or develop a new Load
>> test (whatever is easier and more fits their case). However, there is a
>> risk - with KV<byte[], byte[]> developer can overemphasize cases that can
>> never happen in practice, so we need to be careful about the exact
>> configurations we run.
>>
>> Still, I can imagine that there surely will be code that should be common
>> to both types of tests and we seek ways to not duplicate code.
>>
>> WDYT?
>>
>> Regards,
>> Łukasz
>>
>>
>>
>> pon., 10 wrz 2018 o 16:36 Etienne Chauchot <[email protected]>
>> napisał(a):
>>
>> Hi,
>> It seems that there is a notable overlap with what Nexmark already does:
>> Nexmark mesures performance and regression by exercising all the Beam
>> model in both batch and streaming modes with several runners. It also
>> computes on synthetic data. Also nexmark is already included as PostCommits
>> in the CI and dashboards.
>>
>> Shall we merge the two?
>>
>> Best
>>
>> Etienne
>>
>> Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
>>
>> Hello everyone,
>>
>> thank you for all your comments to the proposal. To sum up:
>>
>> A set of performance tests exercising Core Beam Transforms (ParDo,
>> GroupByKey, CoGroupByKey, Combine) will be implemented for Java and Python
>> SDKs. Those tests will allow to:
>>
>>    - measure performance of the transforms on various runners
>>    - exercise the transforms by creating stressful conditions and big
>>    loads using Synthetic Source and Synthetic Step API (delays, keeping cpu
>>    busy or asleep, processing large keys and values, performing fanout or
>>    reiteration of inputs)
>>    - run both in batch and streaming context
>>    - gather various metrics
>>    - notice regressions by comparing data from consequent Jenkins runs
>>
>> Metrics (runtime, consumed bytes, memory usage, split/bundle count) can
>> be gathered during test invocations. We will start with runtime and
>> leverage Metrics API to collect the other metrics in later phases of
>> development.
>> The tests will be fully configurable through pipeline options and it will
>> be possible to run any custom scenarios manually. However, a representative
>> set of testing scenarios will be run periodically using Jenkins.
>>
>> Regards,
>> Łukasz
>>
>> śr., 5 wrz 2018 o 20:31 Rafael Fernandez <[email protected]>
>> napisał(a):
>>
>> neat! left a comment or two
>>
>> On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <[email protected]> wrote:
>>
>> Hi all!
>>
>> I'm bumping this (in case you missed it). Any feedback and questions are
>> welcome!
>>
>> Best regards,
>> Łukasz
>>
>> pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <[email protected]>
>> napisał(a):
>>
>> Hi Lukasz,
>>
>> Thanks for the update, and the abstract looks promising.
>>
>> Let me take a look on the doc.
>>
>> Regards
>> JB
>>
>> On 13/08/2018 13:24, Łukasz Gajowy wrote:
>> > Hi all,
>> >
>> > since Synthetic Sources API has been introduced in Java and Python SDK,
>> > it can be used to test some basic Apache Beam operations (i.e.
>> > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) in
>> > terms of performance. This, in brief, is why we'd like to share the
>> > below proposal:
>> >
>> > _
>> https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
>> >
>> > Let us know what you think in the document's comments. Thank you in
>> > advance for all the feedback!
>> >
>> > Łukasz
>>
>>

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Reply via email to