[jira] [Commented] (CASSANDRA-11752) histograms/metrics in 2.2 do not appear recency biased

Chris Burroughs (JIRA) Wed, 11 May 2016 15:48:36 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280940#comment-15280940
 ]


Chris Burroughs commented on CASSANDRA-11752:
---------------------------------------------

So the [point|http://metrics.dropwizard.io/3.1.0/] of the metrics library is to 
"insight into what your code does in production".   It is integrated into many 
projects.  Users expect to be able to take those metrics and:
 * Draw a [line 
graph|http://www.datastax.com/dev/blog/pluggable-metrics-reporting-in-cassandra-2-0-2].
 * Alert on values so they know when there are problems with a cluster.
 * Use jconsole to inspect beans and determine what is happening Right Now.

I am aware that there are concerns both in implementation and assumptions 
(normal distribution) with the metrics library.  They have been brought up both 
on [this bug tracker|https://issues.apache.org/jira/browse/CASSANDRA-6486] and 
other forums. However imperfect, jconsole, line graphs, and threshold based 
alerts are of critical practical use today.  All of these require *recent* 
data.  When my cluster is failing to meet business needs I want to know as soon 
as possible.  

If I understand your proposal correctly, you are saying it would be better to 
drop all of that, much more powerful (and mathematically sound!) if we did an 
out of band export and merge of all of the histograms and create a heatmap.  
This would provide better insight into the distribution of values (by showing 
the full distribution instead of a handful of percentiles) and allow for 
cluster wide aggregation.  This could be further augmented by using [hue and 
saturaiton|https://docs.joyent.com/public-cloud/d-40-performance/cloud-analytics/use-of-color-in-cloud-analytics]
 to call out latencies for individual nodes or column families.  I think that 
sounds fantastic, but that is very much not where the industry is today.  Maybe 
Circonus can do that, but graphite definitely can't.

And however cool that future sounds, the NEWS entry makes no mention of this as 
an intentional fundamental change. Nor does CASSANDRA-5657 discuss the 
consequences. Indeed CASSANDRA-5657 hoped for improved accuracy and went out of 
the way to keep JMX functioning!

> histograms/metrics in 2.2 do not appear recency biased
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11752
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11752
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Chris Burroughs
>              Labels: metrics
>         Attachments: boost-metrics.png, c-jconsole-comparison.png, 
> c-metrics.png, default-histogram.png
>
>
> In addition to upgrading to metrics3, CASSANDRA-5657 switched to using  a 
> custom histogram implementation.  After upgrading to Cassandra 2.2 
> histograms/timer metrics are not suspiciously flat.  To be useful for 
> graphing and alerting metrics need to be biased towards recent events.
> I have attached images that I think illustrate this.
>  * The first two are a comparison between latency observed by a C* 2.2 (us) 
> cluster shoring very flat lines and a client (using metrics 2.2.0, ms) 
> showing server performance problems.  We can't rule out with total certainty 
> that something else isn't the cause (that's why we measure from both the 
> client & server) but they very rarely disagree.
>  * The 3rd image compares jconsole viewing of metrics on a 2.2 and 2.1 
> cluster over several minutes.  Not a single digit changed on the 2.2 cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11752) histograms/metrics in 2.2 do not appear recency biased

Reply via email to