Since I cited several papers in my essay below, I might as well add the latest one, which describes our use of automatic change point detection inside Datastax. We've indirectly been testing Cassandra 4.0 already over a year with this method, as we use change detection against an internal fork of 4.0.
https://arxiv.org/abs/2301.03034 henrik On Sun, Jan 8, 2023 at 5:12 AM Henrik Ingo <henrik.i...@datastax.com> wrote: > Hi Josh, all > > I'm sitting at an airport, so rather than participating in the comment > threads in the doc, I will just post some high level principles I've > derived during my own long career in performance testing. > > Infra: > - It's a common myth that you need to use on premise HW because cloud HW > is noisy. > - Most likely the opposite is true: A small cluster of lab hardware runs > the risk of some sysadmin with root access manually modifying the servers > and leave them in an inconsistent configuration. Otoh a public cloud is > configured with infrastructure as code, so every change is tracked in > version control. > - Four part article on how we tuned EC2 at my previous employer: 1 > <https://www.mongodb.com/blog/post/reducing-variability-performance-tests-ec2-setup-key-results>, > 2 > <https://www.mongodb.com/blog/post/repeatable-performance-tests-ec2-instances-neither-good-nor-bad>, > 3 > <https://www.mongodb.com/blog/post/repeadtable-performance-tests-ebs-instances-stable-option> > , 4 > <https://www.mongodb.com/blog/post/repeatable-performance-tests-cpu-options-best-disabled> > . > - Trust no one, measure everything. For example, don't trust that what > I'm writing here is true. Run sysbench against your HW, then you have first > hand observations. > - Specifically using EC2 has an additional benefit that the instance > types can be considered well known and standard HW configurations more than > any on premise system. > > Performance testing is regression testing > - Important: Run perf tests with the nightly build. Make sure your HW > configuration is repeatable and low variability from day to day. > - Less important / later: > - Using complciated benchmarks (tpcc...) that try to model a real > world app. These can take weeks to develop, each. > - Having lots of different benchmarks for "coverage". > - Adding the above two together: Running a simple key-value test (e.g. > YCSB) every night in an automated and repeatable way, and storing the > result - whatever is considered relevant - so that you end up with a > timeseries is a great start and I'd take this over that complicated > "representative" benchmark any day. > - Use change detection to automatically and deterministically flag > statistically significant change points (regressions). > - Literature: detecting-performance-regressions-with-datastax-hunter > <https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4> > , > - Literature: Fallout: Distributed Systems Testing as a Service > <https://www.semanticscholar.org/paper/0cebbfebeab6513e98ad1646cc795cabd5ddad8a> > Automated system performance testing at MongoDB > <https://www.connectedpapers.com/main/0cebbfebeab6513e98ad1646cc795cabd5ddad8a/graph> > > > Common gotchas: > - Testing with a small data set that fits entirely in RAM. A good dataset > is 5x the RAM available to the DB process. Or you just test with the size a > real production server would be running - at Datastax we have tests that > use a 1TB and 1.5TB data set, because those tend to be standard maximum > sizes (per node) at customers. > - The test runtime is too short. IT depends on the database what is a > good test duration. The goal is to reach stable state. But for an LSM > database like Cassandra this can be hard. For other databases I worked > with, the default is typically to flush every 15 to 60 seconds, and the > test duration should be a multiple of those (3 to 5 min). > - Naive comparisons to determine whether a test result is a regression or > not. For example benchmarking the new release against the stable version, > one run each, and reporting the result as "fact". Or comparing today's > result with yesterday's. > ' > > Building perf testing systems following the above principles have had a > lot of positive impact in my projects. For example, at my previous employer > we caught 17 significant regressions during the 1 year long development > cycle of the next major version. (see my paper above) Otoh after the GA > release, during the next year users only reported 1 significant performance > regression. That is to say, the perf testing of nightly builds caught all > but one regressions in the new major version. > > henrik > > > > > On Fri, Dec 30, 2022 at 7:41 AM Josh McKenzie <jmcken...@apache.org> > wrote: > >> There was a really interesting presentation from the Lucene folks at >> ApacheCon about how they're doing perf regression testing. That combined >> with some recent contributors wanting to get involved on some performance >> work and not having much direction or clarity on how to get involved led >> some of us to come together and riff on what we might be able to take away >> from that presentation and context. >> >> Lucene presentation: "Learning from 11+ years of Apache Lucene >> benchmarks": >> https://docs.google.com/presentation/d/1Tix2g7W5YoSFK8jRNULxOtqGQTdwQH3dpuBf4Kp4ouY/edit#slide=id.p >> >> Their nightly indexing benchmark site: >> https://home.apache.org/~mikemccand/lucenebench/indexing.html >> >> I've checked in with a handful of performance minded contributors in >> early December and we came up with a first draft, then some others of us >> met on an adhoc call on the 12/9 (which was recorded; ping on this thread >> if you'd like that linked - I believe Joey Lynch has that). >> >> Here's where we landed after the discussions earlier this month (1st >> page, estimated reading time 5 minutes): >> https://docs.google.com/document/d/1X5C0dQdl6-oGRr9mXVPwAJTPjkS8lyt2Iz3hWTI4yIk/edit# >> >> Curious to hear what other perspectives there are out there on the topic. >> >> Early Happy New Years everyone! >> >> ~Josh >> >> >> > > -- > > Henrik Ingo > > +358 40 569 7354 <358405697354> > > [image: Visit us online.] <https://www.datastax.com/> [image: Visit us > on Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on > YouTube.] > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> > [image: Visit my LinkedIn profile.] > <https://www.linkedin.com/in/heingo/> > -- Henrik Ingo +358 40 569 7354 <358405697354> [image: Visit us online.] <https://www.datastax.com/> [image: Visit us on Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on YouTube.] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>