- I'll add some more context and intent to my original query. - - Let's say existing functionality in some application is slow: - - - we profile and detect the functions that are slow (takes ~30 minutes to finish) - - we improve them and see speed benefits (brought it down to ~3 minutes) - - we benchmark the old timings measured on each of these functionalities (say they are individual methods) - - we then use those numbers to compare the performance of the improved functionalitiies - - So a test assertion would look like assertThat(currentSpeed, isLessOrEqualTo(expectedSpeed)) - - expectedSpeed is pinned to fixed values approximately 10% of the original slow speed - Although as we know this currentSpeed values can be spiky/flaky at times (by some small variations) -
One way I tried to bring down the number of failing tests is taking values from multiple runs, averaging them and then using the average to compare with the expectedSpeed - this has given much better results. (I was also advised to use the std deviation if necessary - I haven't applied it yet) Does this sound like regularly used method, are there better ways to do this? I hope the context is clearer now. On Sun, 16 Feb 2020 at 14:51, Mani Sarkar <[email protected]> wrote: > Hi all > > I have been recently writing performance tests and each time I reach a > milestone I come across slightly new challenges. > > At first it was capturing the baselines and then pinning the tests to the > new performance numbers. > > But then the question arises how do we check if our tests are telling the > right thing if the underlying system or the implementation or both have an > element of flakiness. > > Do you run them a few times and then take the average of them or do you > run them a few times and if they pass a set maximum number of times the > test is good or else its a failed test. > > I'm sure many of you might have come to this situation when you have > optimised a system and want to regression proof it. And want to ensure that > it should tell you when the underlying implementation has genuinely > regressed due to some changes. > > It's not cool if the performance tests randomly fail on CICD or local > machine. > > Just want to know how everyone else does it. And what you think of the > above. > > Regards > Mani > -- > -- > @theNeomatrix369 | Blogs: https://medium.com/@neomatrix369 > | @adoptopenjdk @graalvm @graal @truffleruby | Github: > https://github.com/neomatrix369 | Slideshare: > https://slideshare.net/neomatrix369 | LinkedIn: > https://uk.linkedin.com/in/mani-sarkar > > Don't chase success, rather aim for "Excellence", and success will come > chasing after you! > -- @theNeomatrix369 <https://twitter.com/theNeomatrix369>* | **Blogs <https://medium.com/@neomatrix369>** | *@adoptopenjdk <https://twitter.com/adoptopenjdk> @graalvm <https://twitter.com/graalvm> @truffleruby <https://twitter.com/truffleruby> *| **Github <https://github.com/neomatrix369>* * | * *Slideshare <https://slideshare.net/neomatrix369>* * | **LinkedIn <https://uk.linkedin.com/pub/mani-sarkar/71/a77/39b>* *Sponsor *https://github.com/sponsors/neomatrix369 *Don't chase success, rather aim for "Excellence", and success will come chasing after you!* -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/CAGHtMWnGjLVBPByVDAC%2BX1%2BQo3o%2Bwz8%2BL6uYK_cqmq5N2qTjyQ%40mail.gmail.com.
