Re: [PROPOSAL] Test performance of basic Apache Beam operations

Etienne Chauchot Wed, 12 Sep 2018 05:23:21 -0700

Let me elaborate a bit my last sentenceLe mardi 11 septembre 2018 à 11:29 
+0200, Etienne Chauchot a écrit :
> Hi Lukasz,
> 
> Well, having low level byte[] based pure performance tests makes sense. And 
> having high level realistic model (Nexmark
> auction system) makes sense also to avoid testing unrealistic pipelines as 
> you describe.
> 
> Have common code between the 2 seems difficult as both the architecture and 
> the model are different.
> 
> I'm more concerned about having two CI mechanisms to detect 
> functionnal/performance regressions.


Even if parts of NexMark and performance tests are the same they could target 
different objectives: raw performance
tests (the new framework) and user oriented tests (nexmark). So they might be 
complementary.
We must just chose how to run them. I think we need to have only one automatic 
regression detection tool. IMHO, the most
relevant for func/perf regression is Nexmark because it represents what a real 
user could do (it simulates an auction
system). So let's  keep it as post commits. Post commits allow to target a 
particular commit that introduced a
regression. 
We could schedule the new performance tests.
BestEtienne

>  BestEtienne
> Le lundi 10 septembre 2018 à 18:33 +0200, Łukasz Gajowy a écrit :
> > In my opinion and as far as I understand Nexmark, there are some benefits 
> > to having both types of tests. The load
> > tests we propose can be very straightforward and clearly show what is being 
> > tested thanks to the fact that there's
> > no fixed model but very "low level" KV<byte[], byte[]> collections only. 
> > They are more flexible in shapes of the
> > pipelines they can express e.g. fanout_64, without having to think about 
> > specific use cases. 
> > 
> > Having both types would allow developers to decide whether they want to 
> > create a new Nexmark query for their
> > specific case or develop a new Load test (whatever is easier and more fits 
> > their case). However, there is a risk -
> > with KV<byte[], byte[]> developer can overemphasize cases that can never 
> > happen in practice, so we need to be
> > careful about the exact configurations we run. 
> > 
> > Still, I can imagine that there surely will be code that should be common 
> > to both types of tests and we seek ways to
> > not duplicate code.
> > 
> > WDYT? 
> > 
> > Regards, 
> > Łukasz
> > 
> > 
> > 
> > pon., 10 wrz 2018 o 16:36 Etienne Chauchot <[email protected]> 
> > napisał(a):
> > > Hi,It seems that there is a notable overlap with what Nexmark already 
> > > does:Nexmark mesures performance and
> > > regression by exercising  all the Beam model in both batch and streaming 
> > > modes with several runners. It also
> > > computes on synthetic data. Also nexmark is already included as 
> > > PostCommits in the CI and dashboards.
> > > Shall we merge the two?
> > > Best
> > > Etienne
> > > Le lundi 10 septembre 2018 à 12:56 +0200, Łukasz Gajowy a écrit :
> > > > Hello everyone, 
> > > > 
> > > > thank you for all your comments to the proposal. To sum up: 
> > > > 
> > > > A set of performance tests exercising Core Beam Transforms (ParDo, 
> > > > GroupByKey, CoGroupByKey, Combine) will be
> > > > implemented for Java and Python SDKs. Those tests will allow to: 
> > > > measure performance of the transforms on various runners
> > > > exercise the transforms by creating stressful conditions and big loads 
> > > > using Synthetic Source and Synthetic Step
> > > > API (delays, keeping cpu busy or asleep, processing large keys and 
> > > > values, performing fanout or reiteration of
> > > > inputs)
> > > > run both in batch and streaming context
> > > > gather various metrics
> > > > notice regressions by comparing data from consequent Jenkins runs  
> > > > Metrics (runtime, consumed bytes, memory usage, split/bundle count) can 
> > > > be gathered during test invocations. We
> > > > will start with runtime and leverage Metrics API to collect the other 
> > > > metrics in later phases of development. 
> > > > The tests will be fully configurable through pipeline options and it 
> > > > will be possible to run any custom
> > > > scenarios manually. However, a representative set of testing scenarios 
> > > > will be run periodically using Jenkins.
> > > > 
> > > > Regards, 
> > > > Łukasz 
> > > > 
> > > > śr., 5 wrz 2018 o 20:31 Rafael Fernandez <[email protected]> 
> > > > napisał(a):
> > > > > neat! left a comment or two
> > > > > 
> > > > > On Mon, Sep 3, 2018 at 3:53 AM Łukasz Gajowy <[email protected]> 
> > > > > wrote:
> > > > > > Hi all! 
> > > > > > 
> > > > > > I'm bumping this (in case you missed it). Any feedback and 
> > > > > > questions are welcome!
> > > > > > 
> > > > > > Best regards, 
> > > > > > Łukasz
> > > > > > 
> > > > > > pon., 13 sie 2018 o 13:51 Jean-Baptiste Onofré <[email protected]> 
> > > > > > napisał(a):
> > > > > > > Hi Lukasz,
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Thanks for the update, and the abstract looks promising.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Let me take a look on the doc.
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Regards
> > > > > > > 
> > > > > > > JB
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > On 13/08/2018 13:24, Łukasz Gajowy wrote:
> > > > > > > 
> > > > > > > > Hi all, 
> > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > > since Synthetic Sources API has been introduced in Java and 
> > > > > > > > Python SDK,
> > > > > > > 
> > > > > > > > it can be used to test some basic Apache Beam operations (i.e.
> > > > > > > 
> > > > > > > > GroupByKey, CoGroupByKey Combine, ParDo and ParDo with 
> > > > > > > > SideInput) in
> > > > > > > 
> > > > > > > > terms of performance. This, in brief, is why we'd like to share 
> > > > > > > > the
> > > > > > > 
> > > > > > > > below proposal:
> > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > > _https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing_
> > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > > Let us know what you think in the document's comments. Thank 
> > > > > > > > you in
> > > > > > > 
> > > > > > > > advance for all the feedback!
> > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > > Łukasz
> > > > > > > 
> > > > > > > 
> > > > > > >

Re: [PROPOSAL] Test performance of basic Apache Beam operations

Reply via email to