[
https://issues.apache.org/jira/browse/CASSANDRA-8503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250242#comment-14250242
]
Ariel Weisberg edited comment on CASSANDRA-8503 at 12/17/14 6:17 PM:
---------------------------------------------------------------------
I think there are two general classes of benchmarks you would run in CI.
Representative user workloads, and targeted microbenchmark workloads. Targeted
workloads are a huge help during ongoing development because they magnify the
impact of regressions from code changes that are harder to notice in
representative workloads. They also point to the specific subsystem being
benchmarked.
I will just cover the microbenchmarks. The full matrix is large so there is an
element of wanting ponies, but the reality is that they are all interesting
from a preventing performance regressions and understanding the impact of
ongoing changes perspective.
Benchmark the stress client, so excess server capacity and a single client
testing lots of small messages. Lots of large messages. Stuff the servers can
answer as fast as possible. The flip side of this workload is the same thing
but for the server where you measure how many trivially answerable tiny queries
you can shove through a cluster given excess client capacity. When testing the
server this might also be when you test the matrix of replication and
consistency levels.
Benchmark perfomance of non-prepared statements.
Benchmark performance of preparing statements?
A full test matrix for data intensive workloads would test read, write, and
50/50, and for a bonus 90/10. Single cell partitions with a small value and a
large value, and a range of wide rows (small, medium, large). All 3 compaction
strategies with compression on/off. Data intensive workloads also need to run
against a spinning rust and SSDs.
CQL specific microbenchmarks against specific CQL datatypes. If there are
interactions that are important we should capture those.
Counters
Lightweight transactions
The matrix also needs to include different permutations of replication
strategies and consistency levels. Maybe we can constrain those variations to
parts of the matrix that would best reflect the impact of replication
strategies and CL. Probably a subset of the data intensive workloads.
Also want a workload targeting the row cache and key cache when everything is
cached and when there is a realistic long tail of data not in the cache.
For every workload to really get the value you would like a graph for
throughput and a graph for latency at some percentile with a data point per
revision tested going back to the beginning as well as a 90 day graph. A trend
line also helps. Then someone has to be it for monitoring the graphs and poking
people when there is an issue.
The workflow usually goes something like the monitor tags the author of the
suspected bad revision who triages it and either fixes it or hands it off to
the correct person. Timeliness is really important because once regressions
start stacking it's a pain to know whether you have done what you should to fix
it.
was (Author: aweisberg):
I think there are two general classes of benchmarks you would run in CI.
Representative user workloads, and targeted microbenchmark workloads. Targeted
workloads are a huge help during ongoing development because they magnify the
impact of regressions from code changes that are harder to notice in
representative workloads. They also point to the specific subsystem being
benchmarked.
I will just cover the microbenchmarks. The full matrix is large so there is an
element of wanting ponies, but the reality is that they are all interesting
from a preventing performance regressions and understanding the impact of
ongoing changes perspective.
Benchmark the stress client, so excess server capacity and a single client
testing lots of small messages. Lots of large messages. Stuff the servers can
answer as fast as possible. The flip side of this workload is the same thing
but for the server where you measure how many trivially answerable tiny queries
you can shove through a cluster given excess client capacity.
Benchmark perfomance of non-prepared statements.
Benchmark performance of preparing statements?
A full test matrix for data intensive workloads would test read, write, and
50/50, and for a bonus 90/10. Single cell partitions with a small value and a
large value, and a range of wide rows (small, medium, large). All 3 compaction
strategies with compression on/off. Data intensive workloads also need to run
against a spinning rust and SSDs.
CQL specific microbenchmarks against specific CQL datatypes. If there are
interactions that are important we should capture those.
Counters
Lightweight transactions
The matrix also needs to include different permutations of replication
strategies and consistency levels. Maybe we can constrain those variations to
parts of the matrix that would best reflect the impact of replication
strategies and CL. Probably a subset of the data intensive workloads.
Also want a workload targeting the row cache and key cache when everything is
cached and when there is a realistic long tail of data not in the cache.
For every workload to really get the value you would like a graph for
throughput and a graph for latency at some percentile with a data point per
revision tested going back to the beginning as well as a 90 day graph. A trend
line also helps. Then someone has to be it for monitoring the graphs and poking
people when there is an issue.
The workflow usually goes something like the monitor tags the author of the
suspected bad revision who triages it and either fixes it or hands it off to
the correct person. Timeliness is really important because once regressions
start stacking it's a pain to know whether you have done what you should to fix
it.
> Collect important stress profiles for regression analysis done by jenkins
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-8503
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8503
> Project: Cassandra
> Issue Type: Task
> Reporter: Ryan McGuire
> Assignee: Ryan McGuire
>
> We have a weekly job setup on CassCI to run a performance benchmark against
> the dev branches as well as the last stable releases.
> Here's an example:
> http://cstar.datastax.com/tests/id/8223fe2e-8585-11e4-b0bf-42010af0688f
> This test is currently pretty basic, it's running on three nodes, with a the
> default stress profile. We should crowdsource a collection of stress profiles
> to run, and then once we have many of these tests running we can collect them
> all into a weekly email.
> Ideas:
> * Timeseries (Can this be done with stress? not sure)
> * compact storage
> * compression off
> * ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)