[
https://issues.apache.org/jira/browse/CASSANDRA-7731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133715#comment-14133715
]
Cyril Scetbon commented on CASSANDRA-7731:
------------------------------------------
[~snazy] Your link "exponentially-decaying-reservoirs" is outdated. It seems
that the project has been removed from github or maybe renamed..
My tests are showing that the maximum value collected can be persistent for far
more than 5 minutes. In the following example, I'm executing one CQL query that
scans 2 tombstones and after that 1 CQL query each second that scan 0
tombstones. After more than 1300 queries, I still have the same max value. When
I check the list of values, it doesn't seem to change, even if the mean changes.
{code}
val 545 = 0.0
val 546 = 2.0
count = 1330
max = 2.0
pmax = 2.0
mean = 0.0015037593984962407
min = 0.0
Median = 0.0
99p = 0.0
{code}
So even if the mean is well calculated, I can't understand why the max value is
still the same after 20 minutes of queries scanning 0 tombstones.
I have to confess that after 30 minutes, I get the expected behavior :
{code}
val 142 = 0.0
val 143 = 0.0
val 144 = 0.0
count = 1473
max = 2.0
pmax = 0.0
mean = 0.0013577732518669382
min = 0.0
Median = 0.0
99p = 0.0
{code}
However, I need to be sure that the problem is solved and that it lasts only 5
minutes and not 30 minutes ...
> Get max values for live/tombstone cells per slice
> -------------------------------------------------
>
> Key: CASSANDRA-7731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7731
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Cyril Scetbon
> Assignee: Robert Stupp
> Priority: Minor
> Fix For: 2.1.1
>
> Attachments: 7731-2.0.txt, 7731-2.1.txt
>
>
> I think you should not say that slice statistics are valid for the [last five
> minutes
> |https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java#L955-L956]
> in CFSTATS command of nodetool. I've read the documentation from yammer for
> Histograms and there is no way to force values to expire after x minutes
> except by
> [clearing|http://grepcode.com/file/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.1.2/com/yammer/metrics/core/Histogram.java#96]
> it . The only thing I can see is that the last snapshot used to provide the
> median (or whatever you'd used instead) value is based on 1028 values.
> I think we should also be able to detect that some requests are accessing a
> lot of live/tombstone cells per query and that's not possible for now without
> activating DEBUG for SliceQueryFilter for example and by tweaking the
> threshold. Currently as nodetool cfstats returns the median if a low part of
> the queries are scanning a lot of live/tombstone cells we miss it !
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)