[ 
https://issues.apache.org/jira/browse/CASSANDRA-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-9870:
--------------------------------
    Description: 
CASSANDRA-7918 introduces graph output from a stress run, but these graphs are 
a little limited. Attached to the ticket is an example of some improved graphs 
which can serve as the *basis* for some improvements, which I will briefly 
describe. They should not be taken as the exact end goal, but we should aim for 
at least their functionality. Preferably with some Javascript advantages thrown 
in, such as the hiding of datasets/graphs for clarity. Any ideas for 
improvements are *definitely* encouraged.

Some overarching design principles:
* Display _on *one* screen_ all of the information necessary to get a good idea 
of how two or more branches compare to each other. Ideally we will reintroduce 
this, painting multiple graphs onto one screen, stretched to fit.
* Axes must be truncated to only the interesting dimensions, to ensure there is 
no wasted space.
* Each graph displaying multiple kinds of data should use colour _and shape_ to 
help easily distinguish the different datasets.
* Each graph should be tailored to the data it is representing, and we should 
have multiple views of each data.

The data can roughly be partitioned into three kinds:
* throughput
* latency
* gc

These can each be viewed in different ways:
* as a continuous plot of:
** raw data
** scaled/compared to a "base" branch, or other metric
** cumulatively
* as box plots
** ideally, these will plot median, outer quartiles, outer deciles and absolute 
limits of the distribution, so the shape of the data can be best understood

Each compresses the information differently, losing different information, so 
that collectively they help to understand the data.

Some basic rules for presentation that work well:
* Latency information should be plotted to a logarithmic scale, to avoid high 
latencies drowning out low ones
* GC information should be plotted cumulatively, to avoid differing throughputs 
giving the impression of worse GC. It should also have a line that is rescaled 
by the amount of work (number of operations) completed
* Throughput should be plotted as the actual numbers

To walk the graphs top-left to bottom-right, we have:

* Spot throughput comparison of branches to the baseline branch, as an 
improvement ratio (which can of course be negative, but is not in this example)
* Raw throughput of all branches (no baseline)
* Raw throughput as a box plot
* Latency percentiles, compared to baseline. The percentage improvement at any 
point in time vs baseline is calculated, and then multiplied by the overall 
median for the entire run. This simply permits the non-baseline branches to 
scatter their wins/loss around a relatively clustered line for each percentile. 
It's probably the most "dishonest" graph but comparing something like latency 
where each data point can have very high variance is difficult, and this gives 
you an idea of clustering of improvements/losses.
* Latency percentiles, raw, each with a different shape; lowest percentiles 
plotted as a solid line as they vary least, with higher percentiles each 
getting their own subtly different shape to scatter.
* Latency box plots
* GC time, plotted cumulatively and also scaled by work done
* GC Mb, plotted cumulatively and also scaled by work done
* GC time, raw
* GC time as a box plot

These do mostly introduce the concept of a "baseline" branch. It may be that, 
ideally, this baseline be selected by a dropdown so the javascript can 
transform the output dynamically. This would permit more interesting 
comparisons to be made on the fly.

There are also some complexities, such as deciding which datapoints to compare 
against baseline when times get out-of-whack (due to GC, etc, causing a lack of 
output for a period). The version I uploaded does a merge of the times, 
permitting a small degree of variance, and ignoring those datapoints we cannot 
pair. One option here might be to change stress' behaviour to always print to a 
strict schedule, instead of trying to get absolutely accurate apportionment of 
timings. If this makes things much simpler, it can be done.

As previously stated, but may be lost in the wall-of-text, these should be 
taken as a starting point / sign post, rather than a golden rule for the end 
goal. But ideally they will be the lower bound of what we can deliver.

  was:
CASSANDRA-7918 introduces graph output from a stress run, but these graphs are 
a little limited. Attached to the ticket is an example of some improved graphs 
which can serve as the *basis* for some improvements, which I will briefly 
describe. They should not be taken as the exact end goal, but we should aim for 
at least their functionality. Preferably with some Javascript advantages thrown 
in, such as the hiding of datasets/graphs for clarity. Any ideas for 
improvements are *definitely* encouraged.

Firstly, one overriding design principle was to display _on *one* screen_ all 
of the information necessary to get a good idea of how two or more branches 
compare to each other. Ideally we will reintroduce this, painting multiple 
graphs onto one screen, stretched to fit.

Secondly, axes must be truncated to only the interesting dimensions, to ensure 
there is no wasted space.

Thirdly, each graph displaying multiple kinds of data should use colour _and 
shape_ to help easily distinguish the different datasets.

Fourthly, each graph should be tailored to the data it is representing, and we 
should have multiple views of each data.

The data can roughly be partitioned into three kinds: throughput, latency and 
gc.
These can each be viewed in different ways:
* as a continuous plot of:
** raw data
** scaled/compared to a "base" branch, or other metric
** cumulatively
* as box plots
** ideally, these will plot median, outer quartiles, outer deciles and absolute 
limits of the distribution, so the shape of the data can be best understood

Each compresses the information differently, losing different information, so 
that collectively they help to understand the data.

Some basic rules that work well:
* Latency information should be plotted to a logarithmic scale, to avoid high 
latencies drowning out low ones
* GC information should be plotted cumulatively, to avoid differing throughputs 
giving the impression of worse GC. It should also have a line that is rescaled 
by the amount of work (number of operations) completed
* Throughput should be plotted as the actual numbers

To walk the graphs top-left to bottom-right, we have:

* Spot throughput comparison of branches to the baseline branch, as an 
improvement ratio (which can of course be negative, but is not in this example)
* Raw throughput of all branches (no baseline)
* Raw throughput as a box plot
* Latency percentiles, compared to baseline. The percentage improvement at any 
point in time vs baseline is calculated, and then multiplied by the overall 
median for the entire run. This simply permits the non-baseline branches to 
scatter their wins/loss around a relatively clustered line for each percentile. 
It's probably the most "dishonest" graph but comparing something like latency 
where each data point can have very high variance is difficult, and this gives 
you an idea of clustering of improvements/losses.
* Latency percentiles, raw, each with a different shape; lowest percentiles 
plotted as a solid line as they vary least, with higher percentiles each 
getting their own subtly different shape to scatter.
* Latency box plots
* GC time, plotted cumulatively and also scaled by work done
* GC Mb, plotted cumulatively and also scaled by work done
* GC time, raw
* GC time as a box plot

These do mostly introduce the concept of a "baseline" branch. It may be that, 
ideally, this baseline be selected by a dropdown so the javascript can 
transform the output dynamically. This would permit more interesting 
comparisons to be made on the fly.

There are also some complexities, such as deciding which datapoints to compare 
against baseline when times get out-of-whack (due to GC, etc, causing a lack of 
output for a period). The version I uploaded does a merge of the times, 
permitting a small degree of variance, and ignoring those datapoints we cannot 
pair. One option here might be to change stress' behaviour to always print to a 
strict schedule, instead of trying to get absolutely accurate apportionment of 
timings. If this makes things much simpler, it can be done.

As previously stated, but may be lost in the wall-of-text, these should be 
taken as a starting point / sign post, rather than a golden rule for the end 
goal. But ideally they will be the lower bound of what we can deliver.


> Improve cassandra-stress graphing
> ---------------------------------
>
>                 Key: CASSANDRA-9870
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9870
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Benedict
>         Attachments: reads.svg
>
>
> CASSANDRA-7918 introduces graph output from a stress run, but these graphs 
> are a little limited. Attached to the ticket is an example of some improved 
> graphs which can serve as the *basis* for some improvements, which I will 
> briefly describe. They should not be taken as the exact end goal, but we 
> should aim for at least their functionality. Preferably with some Javascript 
> advantages thrown in, such as the hiding of datasets/graphs for clarity. Any 
> ideas for improvements are *definitely* encouraged.
> Some overarching design principles:
> * Display _on *one* screen_ all of the information necessary to get a good 
> idea of how two or more branches compare to each other. Ideally we will 
> reintroduce this, painting multiple graphs onto one screen, stretched to fit.
> * Axes must be truncated to only the interesting dimensions, to ensure there 
> is no wasted space.
> * Each graph displaying multiple kinds of data should use colour _and shape_ 
> to help easily distinguish the different datasets.
> * Each graph should be tailored to the data it is representing, and we should 
> have multiple views of each data.
> The data can roughly be partitioned into three kinds:
> * throughput
> * latency
> * gc
> These can each be viewed in different ways:
> * as a continuous plot of:
> ** raw data
> ** scaled/compared to a "base" branch, or other metric
> ** cumulatively
> * as box plots
> ** ideally, these will plot median, outer quartiles, outer deciles and 
> absolute limits of the distribution, so the shape of the data can be best 
> understood
> Each compresses the information differently, losing different information, so 
> that collectively they help to understand the data.
> Some basic rules for presentation that work well:
> * Latency information should be plotted to a logarithmic scale, to avoid high 
> latencies drowning out low ones
> * GC information should be plotted cumulatively, to avoid differing 
> throughputs giving the impression of worse GC. It should also have a line 
> that is rescaled by the amount of work (number of operations) completed
> * Throughput should be plotted as the actual numbers
> To walk the graphs top-left to bottom-right, we have:
> * Spot throughput comparison of branches to the baseline branch, as an 
> improvement ratio (which can of course be negative, but is not in this 
> example)
> * Raw throughput of all branches (no baseline)
> * Raw throughput as a box plot
> * Latency percentiles, compared to baseline. The percentage improvement at 
> any point in time vs baseline is calculated, and then multiplied by the 
> overall median for the entire run. This simply permits the non-baseline 
> branches to scatter their wins/loss around a relatively clustered line for 
> each percentile. It's probably the most "dishonest" graph but comparing 
> something like latency where each data point can have very high variance is 
> difficult, and this gives you an idea of clustering of improvements/losses.
> * Latency percentiles, raw, each with a different shape; lowest percentiles 
> plotted as a solid line as they vary least, with higher percentiles each 
> getting their own subtly different shape to scatter.
> * Latency box plots
> * GC time, plotted cumulatively and also scaled by work done
> * GC Mb, plotted cumulatively and also scaled by work done
> * GC time, raw
> * GC time as a box plot
> These do mostly introduce the concept of a "baseline" branch. It may be that, 
> ideally, this baseline be selected by a dropdown so the javascript can 
> transform the output dynamically. This would permit more interesting 
> comparisons to be made on the fly.
> There are also some complexities, such as deciding which datapoints to 
> compare against baseline when times get out-of-whack (due to GC, etc, causing 
> a lack of output for a period). The version I uploaded does a merge of the 
> times, permitting a small degree of variance, and ignoring those datapoints 
> we cannot pair. One option here might be to change stress' behaviour to 
> always print to a strict schedule, instead of trying to get absolutely 
> accurate apportionment of timings. If this makes things much simpler, it can 
> be done.
> As previously stated, but may be lost in the wall-of-text, these should be 
> taken as a starting point / sign post, rather than a golden rule for the end 
> goal. But ideally they will be the lower bound of what we can deliver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to