Re: [PROPOSAL] Test performance of basic Apache Beam operations

Etienne Chauchot Wed, 12 Sep 2018 05:13:47 -0700
@Alexey
We already do that with Nexmark with the graphs but it is a visual check (like 
I did this morning for the release vote)
Etienne
Le mardi 11 septembre 2018 à 15:05 +0200, Alexey Romanenko a écrit :
> I agree that we can benefit from having two types of performance tests (low 
> and high level) that could complement each
> other.Can we detect a regression (if any) automatically and send a report 
> about that? Sorry if we already do that for
> Nexmark.
> 
> > On 11 Sep 2018, at 11:29, Etienne Chauchot <[email protected]> wrote:
> > 
> > Hi Lukasz,
> > Well, having low level byte[] based pure performance tests makes sense. And 
> > having high level realistic model
> > (Nexmark auction system) makes sense also to avoid testing unrealistic 
> > pipelines as you describe.
> > Have common code between the 2 seems difficult as both the architecture and 
> > the model are different.
> > I'm more concerned about having two CI mechanisms to detect 
> > functionnal/performance regressions. BestEtienne
> > Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
> > > In my opinion and as far as I understand Nexmark, there are some benefits 
> > > to having both types of tests. The load
> > > tests we propose can be very straightforward and clearly show what is 
> > > being tested thanks to the fact that there's
> > > no fixed model but very "low level" KV<byte[], byte[]> collections only. 
> > > They are more flexible in shapes of the
> > > pipelines they can express e.g. fanout_64, without having to think about 
> > > specific use cases. 
> > > 
> > > Having both types would allow developers to decide whether they want to 
> > > create a new Nexmark query for their
> > > specific case or develop a new Load test (whatever is easier and more 
> > > fits their case). However, there is a risk -
> > > with KV<byte[], byte[]> developer can overemphasize cases that can never 
> > > happen in practice, so we need to be
> > > careful about the exact configurations we run. 
> > > 
> > > Still, I can imagine that there surely will be code that should be common 
> > > to both types of tests and we seek ways
> > > to not duplicate code.
> > > 
> > > WDYT? 
> > > 
> > > Regards, 
> > > Łukasz
> > > 
> > > 
> > > 
> > > pon., 10 wrz 2018 o 16:36 Etienne Chauchot <[email protected]> 
> > > napisał(a):
> > > > Hi,It seems that there is a notable overlap with what Nexmark already 
> > > > does:Nexmark mesures performance and
> > > > regression by exercising  all the Beam model in both batch and 
> > > > streaming modes with several runners. It also
> > > > computes on synthetic data. Also nexmark is already included as 
> > > > PostCommits in the CI and dashboards.
> > > > Shall we merge the two?
> > > > Best
> > > > Etienne
> > > > Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
> > > > > Hello everyone, 
> > > > > 
> > > > > thank you for all your comments to the proposal. To sum up: 
> > > > > 
> > > > > A set of performance tests exercising Core Beam Transforms (ParDo, 
> > > > > GroupByKey, CoGroupByKey, Combine) will be
> > > > > implemented for Java and Python SDKs. Those tests will allow to: 
> > > > > measure performance of the transforms on various runners
> > > > > exercise the transforms by creating stressful conditions and big 
> > > > > loads using Synthetic Source and Synthetic
> > > > > Step API (delays, keeping cpu busy or asleep, processing large keys 
> > > > > and values, performing fanout or
> > > > > reiteration of inputs)
> > > > > run both in batch and streaming context
> > > > > gather various metrics
> > > > > notice regressions by comparing data from consequent Jenkins runs  
> > > > > Metrics (runtime, consumed bytes, memory usage, split/bundle count) 
> > > > > can be gathered during test invocations.
> > > > > We will start with runtime and leverage Metrics API to collect the 
> > > > > other metrics in later phases of
> > > > > development. 
> > > > > The tests will be fully configurable through pipeline options and it 
> > > > > will be possible to run any custom
> > > > > scenarios manually. However, a representative set of testing 
> > > > > scenarios will be run periodically using Jenkins.
> > > > > 
> > > > > Regards, 
> > > > > Łukasz 
> > > > > 
> > > > > śr., 5 wrz 2018 o 20:31 Rafael Fernandez <[email protected]> 
> > > > > napisał(a):
> > > > > > neat! left a comment or two
> > > > > > 
> > > > > > On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <[email protected]> 
> > > > > > wrote:
> > > > > > > Hi all! 
> > > > > > > 
> > > > > > > I'm bumping this (in case you missed it). Any feedback and 
> > > > > > > questions are welcome!
> > > > > > > 
> > > > > > > Best regards, 
> > > > > > > Łukasz
> > > > > > > 
> > > > > > > pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré 
> > > > > > > <[email protected]> napisał(a):
> > > > > > > > Hi Lukasz,
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks for the update, and the abstract looks promising.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Let me take a look on the doc.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Regards
> > > > > > > > 
> > > > > > > > JB
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > > > > > > > 
> > > > > > > > > Hi all, 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > > since Synthetic Sources API has been introduced in Java and 
> > > > > > > > > Python SDK,
> > > > > > > > 
> > > > > > > > > it can be used to test some basic Apache Beam operations (i.e.
> > > > > > > > 
> > > > > > > > > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with 
> > > > > > > > > SideInput) in
> > > > > > > > 
> > > > > > > > > terms of performance. This, in brief, is why we'd like to 
> > > > > > > > > share the
> > > > > > > > 
> > > > > > > > > below proposal:
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > > _https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > > Let us know what you think in the document's comments. Thank 
> > > > > > > > > you in
> > > > > > > > 
> > > > > > > > > advance for all the feedback!
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > > Łukasz
> > > > > > > > 
> > > > > > > > 
> > > > > > > >
Re: [PROPOSAL] Test performance of basic Apache Beam operations

Reply via email to