Re: [PROPOSAL] Test performance of basic Apache Beam operations

Etienne Chauchot Tue, 11 Sep 2018 02:30:01 -0700
Hi Lukasz,
Well, having low level byte[] based pure performance tests makes sense. And 
having high level realistic model (Nexmark
auction system) makes sense also to avoid testing unrealistic pipelines as you 
describe.
Have common code between the 2 seems difficult as both the architecture and the 
model are different.
I'm more concerned about having two CI mechanisms to detect 
functionnal/performance regressions. BestEtienne
Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
> In my opinion and as far as I understand Nexmark, there are some benefits to 
> having both types of tests. The load
> tests we propose can be very straightforward and clearly show what is being 
> tested thanks to the fact that there's no
> fixed model but very "low level" KV<byte[], byte[]> collections only. They 
> are more flexible in shapes of the
> pipelines they can express e.g. fanout_64, without having to think about 
> specific use cases. 
> 
> Having both types would allow developers to decide whether they want to 
> create a new Nexmark query for their specific
> case or develop a new Load test (whatever is easier and more fits their 
> case). However, there is a risk - with
> KV<byte[], byte[]> developer can overemphasize cases that can never happen in 
> practice, so we need to be careful about
> the exact configurations we run. 
> 
> Still, I can imagine that there surely will be code that should be common to 
> both types of tests and we seek ways to
> not duplicate code.
> 
> WDYT? 
> 
> Regards, 
> Łukasz
> 
> 
> 
> pon., 10 wrz 2018 o 16:36 Etienne Chauchot <echauc...@apache.org> napisał(a):
> > Hi,It seems that there is a notable overlap with what Nexmark already 
> > does:Nexmark mesures performance and
> > regression by exercising  all the Beam model in both batch and streaming 
> > modes with several runners. It also
> > computes on synthetic data. Also nexmark is already included as PostCommits 
> > in the CI and dashboards.
> > Shall we merge the two?
> > Best
> > Etienne
> > Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
> > > Hello everyone, 
> > > 
> > > thank you for all your comments to the proposal. To sum up: 
> > > 
> > > A set of performance tests exercising Core Beam Transforms (ParDo, 
> > > GroupByKey, CoGroupByKey, Combine) will be
> > > implemented for Java and Python SDKs. Those tests will allow to: 
> > > measure performance of the transforms on various runners
> > > exercise the transforms by creating stressful conditions and big loads 
> > > using Synthetic Source and Synthetic Step
> > > API (delays, keeping cpu busy or asleep, processing large keys and 
> > > values, performing fanout or reiteration of
> > > inputs)
> > > run both in batch and streaming context
> > > gather various metrics
> > > notice regressions by comparing data from consequent Jenkins runs  
> > > Metrics (runtime, consumed bytes, memory usage, split/bundle count) can 
> > > be gathered during test invocations. We
> > > will start with runtime and leverage Metrics API to collect the other 
> > > metrics in later phases of development. 
> > > The tests will be fully configurable through pipeline options and it will 
> > > be possible to run any custom scenarios
> > > manually. However, a representative set of testing scenarios will be run 
> > > periodically using Jenkins.
> > > 
> > > Regards, 
> > > Łukasz 
> > > 
> > > śr., 5 wrz 2018 o 20:31 Rafael Fernandez <rfern...@google.com> napisał(a):
> > > > neat! left a comment or two
> > > > 
> > > > On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <lgaj...@apache.org> wrote:
> > > > > Hi all! 
> > > > > 
> > > > > I'm bumping this (in case you missed it). Any feedback and questions 
> > > > > are welcome!
> > > > > 
> > > > > Best regards, 
> > > > > Łukasz
> > > > > 
> > > > > pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <j...@nanthrax.net> 
> > > > > napisał(a):
> > > > > > Hi Lukasz,
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Thanks for the update, and the abstract looks promising.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Let me take a look on the doc.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > Regards
> > > > > > 
> > > > > > JB
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > > > > > 
> > > > > > > Hi all, 
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > > since Synthetic Sources API has been introduced in Java and 
> > > > > > > Python SDK,
> > > > > > 
> > > > > > > it can be used to test some basic Apache Beam operations (i.e.
> > > > > > 
> > > > > > > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with SideInput) 
> > > > > > > in
> > > > > > 
> > > > > > > terms of performance. This, in brief, is why we'd like to share 
> > > > > > > the
> > > > > > 
> > > > > > > below proposal:
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > > _https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > > Let us know what you think in the document's comments. Thank you 
> > > > > > > in
> > > > > > 
> > > > > > > advance for all the feedback!
> > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > > Łukasz
> > > > > > 
> > > > > > 
> > > > > >
Re: [PROPOSAL] Test performance of basic Apache Beam operations

Reply via email to