Re: [DISCUSS] Taking another(other(other)) stab at performance testing

Henrik Ingo Tue, 10 Jan 2023 08:06:15 -0800

Since I cited several papers in my essay below, I might as well add the
latest one, which describes our use of automatic change point detection
inside Datastax. We've indirectly been testing Cassandra 4.0 already over a
year with this method, as we use change detection against an internal fork
of 4.0.


https://arxiv.org/abs/2301.03034

henrik

On Sun, Jan 8, 2023 at 5:12 AM Henrik Ingo <henrik.i...@datastax.com> wrote:

> Hi Josh, all
>
> I'm sitting at an airport, so rather than participating in the comment
> threads in the doc, I will just post some high level principles I've
> derived during my own long career in performance testing.
>
> Infra:
>  - It's a common myth that you need to use on premise HW because cloud HW
> is noisy.
>  - Most likely the opposite is true: A small cluster of lab hardware runs
> the risk of some sysadmin with root access manually modifying the servers
> and leave them in an inconsistent configuration. Otoh a public cloud is
> configured with infrastructure as code, so every change is tracked in
> version control.
>  - Four part article on how we tuned EC2 at my previous employer: 1
> <https://www.mongodb.com/blog/post/reducing-variability-performance-tests-ec2-setup-key-results>,
> 2
> <https://www.mongodb.com/blog/post/repeatable-performance-tests-ec2-instances-neither-good-nor-bad>,
> 3
> <https://www.mongodb.com/blog/post/repeadtable-performance-tests-ebs-instances-stable-option>
> , 4
> <https://www.mongodb.com/blog/post/repeatable-performance-tests-cpu-options-best-disabled>
> .
>  - Trust no one, measure everything. For example, don't  trust that what
> I'm writing here is true. Run sysbench against your HW, then you have first
> hand observations.
>  - Specifically using EC2 has an additional benefit that the instance
> types can be considered well known and standard HW configurations more than
> any on premise system.
>
> Performance testing is regression testing
>  - Important: Run perf tests with the nightly build. Make sure your HW
> configuration is repeatable and low variability from day to day.
>  - Less important / later:
>      - Using complciated benchmarks (tpcc...) that try to model a real
> world app. These can take weeks to develop, each.
>      - Having lots of different benchmarks for "coverage".
>  - Adding the above two together: Running a simple key-value test (e.g.
> YCSB) every night in an automated and repeatable way, and storing the
> result - whatever is considered relevant - so that you end up with a
> timeseries is a great start and I'd take this over that complicated
> "representative" benchmark any day.
>  - Use change detection to automatically and deterministically flag
> statistically significant change points (regressions).
>  - Literature: detecting-performance-regressions-with-datastax-hunter
> <https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4>
> ,
>  - Literature: Fallout: Distributed Systems Testing as a Service
> <https://www.semanticscholar.org/paper/0cebbfebeab6513e98ad1646cc795cabd5ddad8a>
>  Automated system performance testing at MongoDB
> <https://www.connectedpapers.com/main/0cebbfebeab6513e98ad1646cc795cabd5ddad8a/graph>
>
>
> Common gotchas:
>  - Testing with a small data set that fits entirely in RAM. A good dataset
> is 5x the RAM available to the DB process. Or you just test with the size a
> real production server would be running - at Datastax we have tests that
> use a 1TB and 1.5TB data set, because those tend to be standard maximum
> sizes (per node) at customers.
>  - The test runtime is too short. IT depends on the database what is a
> good test duration. The goal is to reach stable state. But for an LSM
> database like Cassandra this can be hard. For other databases I worked
> with, the default is typically to flush every 15 to 60 seconds, and the
> test duration should be a multiple of those (3 to 5 min).
>  - Naive comparisons to determine whether a test result is a regression or
> not. For example benchmarking the new release against the stable version,
> one run each, and reporting the result as "fact". Or comparing today's
> result with yesterday's.
> '
>
> Building perf testing systems following the above principles have had a
> lot of positive impact in my projects. For example, at my previous employer
> we caught 17 significant regressions during the 1 year long development
> cycle of the next major version. (see my paper above)  Otoh after the GA
> release, during the next year users only reported 1 significant performance
> regression. That is to say, the perf testing of nightly builds caught all
> but one regressions in the new major version.
>
> henrik
>
>
>
>
> On Fri, Dec 30, 2022 at 7:41 AM Josh McKenzie <jmcken...@apache.org>
> wrote:
>
>> There was a really interesting presentation from the Lucene folks at
>> ApacheCon about how they're doing perf regression testing. That combined
>> with some recent contributors wanting to get involved on some performance
>> work and not having much direction or clarity on how to get involved led
>> some of us to come together and riff on what we might be able to take away
>> from that presentation and context.
>>
>> Lucene presentation: "Learning from 11+ years of Apache Lucene
>> benchmarks":
>> https://docs.google.com/presentation/d/1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY/edit#slide=id.p
>>
>> Their nightly indexing benchmark site:
>> https://home.apache.org/~mikemccand/lucenebench/indexing.html
>>
>> I've checked in with a handful of performance minded contributors in
>> early December and we came up with a first draft, then some others of us
>> met on an adhoc call on the 12/9 (which was recorded; ping on this thread
>> if you'd like that linked - I believe Joey Lynch has that).
>>
>> Here's where we landed after the discussions earlier this month (1st
>> page, estimated reading time 5 minutes):
>> https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/edit#
>>
>> Curious to hear what other perspectives there are out there on the topic.
>>
>> Early Happy New Years everyone!
>>
>> ~Josh
>>
>>
>>
>
> --
>
> Henrik Ingo
>
> +358 40 569 7354 <358405697354>
>
> [image: Visit us online.] <https://www.datastax.com/>  [image: Visit us
> on Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
> YouTube.]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
>   [image: Visit my LinkedIn profile.]
> <https://www.linkedin.com/in/heingo/>
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Taking another(other(other)) stab at performance testing

Reply via email to