[ 
https://issues.apache.org/jira/browse/CASSANDRA-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094434#comment-14094434
 ] 

Robert Stupp commented on CASSANDRA-7731:
-----------------------------------------

Yammer doc for [histograms, especially 
exponentially-decaying-reservoirs|http://metrics.codahale.com/manual/core/#histograms]
 (which are in use there in C*) says ??are representative of (roughly) the last 
five minutes of data??. So it is IMO (roughly) correct to say, that the values 
represent the average of the last five minute.
The code itself is a bit misleading by the method names themself 
({{getLiveCellsPerSlice}} instead of 
{{getMedianLiveCellsPerSliceForLast5Minutes}} (very verbose) for example).
As far as I understood the documentation, that 
exponentially-decaying-reservoirs implementation is the best (cheapest) 
trade-off between accuracy and speed.

Did I understand you correctly, that you basically also want the max value (or 
some other percentile) for live/tombstone cells to detect these "heavy" 
requests? That should be easily possible and not really invasive (just adds 
some methods to JMX) for 2.0.

Or do you want the time span, which is covered by the histogram, to be 
increased?

Clearing the histogram is IMO not an option.

> Average live/tombstone cells per slice
> --------------------------------------
>
>                 Key: CASSANDRA-7731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7731
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Cyril Scetbon
>            Assignee: Robert Stupp
>            Priority: Minor
>
> I think you should not say that slice statistics are valid for the [last five 
> minutes 
> |https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java#L955-L956]
>  in CFSTATS command of nodetool. I've read the documentation from yammer for 
> Histograms and there is no way to force values to expire after x minutes 
> except by 
> [clearing|http://grepcode.com/file/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.1.2/com/yammer/metrics/core/Histogram.java#96]
>  it . The only thing I can see is that the last snapshot used to provide the 
> median (or whatever you'd used instead) value is based on 1028 values.
> I think we should also be able to detect that some requests are accessing a 
> lot of live/tombstone cells per query and that's not possible for now without 
> activating DEBUG for SliceQueryFilter for example and by tweaking the 
> threshold. Currently as nodetool cfstats returns the median if a low part of 
> the queries are scanning a lot of live/tombstone cells we miss it !



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to