Re: TLP tools for stress testing and building test clusters in AWS

Benedict Elliott Smith Fri, 12 Apr 2019 10:42:02 -0700

+1

I’m also just as excited to see some standardised workloads and test bed.  At 
the moment we’re benefiting from some large contributors doing their own 
proprietary performance testing, which is super valuable and something we’ve 
lacked before.  But I’m also keen to see some more representative workloads 
that are reproducible by anybody in the community take shape.



> On 12 Apr 2019, at 18:09, Aleksey Yeshchenko <alek...@apple.com.INVALID> 
> wrote:
> 
> Hey Jon,
> 
> This sounds exciting and pretty useful, thanks.
> 
> Looking forward to using tlp-stress for validating 15066 performance.
> 
> We should touch base some time next week to pick a comprehensive set of 
> workloads and versions, perhaps?
> 
> 
>> On 12 Apr 2019, at 16:34, Jon Haddad <j...@jonhaddad.com> wrote:
>> 
>> I don't want to derail the discussion about Stabilizing Internode
>> Messaging, so I'm starting this as a separate thread.  There was a
>> comment that Josh made [1] about doing performance testing with real
>> clusters as well as a lot of microbenchmarks, and I'm 100% in support
>> of this.  We've been working on some tooling at TLP for the last
>> several months to make this a lot easier.  One of the goals has been
>> to help improve the 4.0 testing process.
>> 
>> The first tool we have is tlp-stress [2].  It's designed with a "get
>> started in 5 minutes" mindset.  My goal was to ship a stress tool that
>> ships with real workloads out of the box that can be easily tweaked,
>> similar to how fio allows you to design a disk workload and tweak it
>> with paramaters.  Included are stress workloads that stress LWTs (two
>> different types), materialized views, counters, time series, and
>> key-value workloads.  Each workload can be modified easily to change
>> compaction strategies, concurrent operations, number of partitions.
>> We can run workloads for a set number of iterations or a custom
>> duration.  We've used this *extensively* at TLP to help our customers
>> and most of our blog posts that discuss performance use it as well.
>> It exports data to both a CSV format and auto sets up prometheus for
>> metrics collection / aggregation.  As an example, we were able to
>> determine that the compression length set on the paxos tables imposes
>> a significant overhead when using the Locking LWT workload, which
>> simulates locking and unlocking of rows.  See CASSANDRA-15080 for
>> details.
>> 
>> We have documentation [3] on the TLP website.
>> 
>> The second tool we've been working on is tlp-cluster [4].  This tool
>> is designed to help provision AWS instances for the purposes of
>> testing.  To be clear, I don't expect, or want, this tool to be used
>> for production environments.  It's designed to assist with the
>> Cassandra build process by generating deb packages or re-using the
>> ones that have already been uploaded.  Here's a short list of the
>> things you'll care about:
>> 
>> 1. Create instances in AWS for Cassandra using any instance size and
>> number of nodes.  Also create tlp-stress instances and a box for
>> monitoring
>> 2. Use any available build of Cassandra, with a quick option to change
>> YAML config.  For example: tlp-stress use 3.11.4 -c
>> concurrent_writes:256
>> 3. Do custom builds just by pointing to a local Cassandra git repo.
>> They can be used the same way as #2.
>> 4. tlp-stress is automatically installed on the stress box.
>> 5. Everything's installed with pure bash.  I considered something more
>> complex, but since this is for development only, it turns out the
>> simplest tool possible works well and it means it's easily
>> configurable.  Just drop in your own bash script starting with a
>> number in a XX_script_name.sh format and it gets run.
>> 6. The monitoring box is running Prometheus.  It auto scrapes
>> Cassandra using the Instaclustr metrics library.
>> 7. Grafana is also installed automatically.  There's a couple sample
>> graphs there now.  We plan on having better default graphs soon.
>> 
>> For the moment it installs java 8 only but that should be easily
>> fixable to use java 11 to test ZGC (it's on my radar).
>> 
>> Documentation for tlp-cluster is here [5].
>> 
>> There's still some things to work out in the tool, and we've been
>> working hard to smooth out the rough edges.  I still haven't announced
>> anything WRT tlp-cluster on the TLP blog, because I don't think it's
>> quite ready for public consumption, but I think the folks on this list
>> are smart enough to see the value in it even if it has a few warts
>> still.
>> 
>> I don't consider myself familiar enough with the networking patch to
>> give it a full review, but I am qualified to build tools to help test
>> it and go through the testing process myself.  From what I can tell
>> the patch is moving the codebase in a positive direction and I'd like
>> to help build confidence in it so we can get it merged in.
>> 
>> We'll continue to build out and improve the tooling with the goal of
>> making it easier for people to jump into the QA side of things.
>> 
>> Jon
>> 
>> [1] 
>> https://lists.apache.org/thread.html/742009c8a77999f4b62062509f087b670275f827d0c1895bf839eece@%3Cdev.cassandra.apache.org%3E
>> [2] https://github.com/thelastpickle/tlp-stress
>> [3] http://thelastpickle.com/tlp-stress/
>> [4] https://github.com/thelastpickle/tlp-cluster
>> [5] http://thelastpickle.com/tlp-cluster/
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

Re: TLP tools for stress testing and building test clusters in AWS

Reply via email to