Jeff Griffith created CASSANDRA-11751:
-----------------------------------------

             Summary: Histogram overflow in metrics
                 Key: CASSANDRA-11751
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11751
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: Cassandra 2.2.6 on Linux
            Reporter: Jeff Griffith


One particular histogram in the cassandra metrics seems to overflow preventing 
the calculation of the mean on the dropwizard "Snapshot". Here is the exception 
that comes from the metrics library:

{code}
java.lang.IllegalStateException: Unable to compute ceiling for max when 
histogram overflowed
        at 
org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:232)
 ~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT]
        at 
org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
 ~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT]
        at 
com.addthis.metrics3.reporter.config.SplunkReporter.reportHistogram(SplunkReporter.java:155)
 ~[reporter-config3-3.0.0.jar:3.0.0]
        at 
com.addthis.metrics3.reporter.config.SplunkReporter.report(SplunkReporter.java:101)
 ~[reporter-config3-3.0.0.jar:3.0.0]
        at 
com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_72]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_72]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_72]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_72]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_72]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_72]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
{code}

On deeper analysis, it seems like this is happening specifically on this metric:
{code}
ColUpdateTimeDeltaHistogram
{code}

I think this is where it is updated in ColumnFamilyStore.java
{code}
    public void apply(DecoratedKey key, ColumnFamily columnFamily, 
SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition 
replayPosition)
    {
        long start = System.nanoTime();
        Memtable mt = data.getMemtableFor(opGroup, replayPosition);
        final long timeDelta = mt.put(key, columnFamily, indexer, opGroup);
        maybeUpdateRowCache(key);
        metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), 
key.hashCode(), 1);
        metric.writeLatency.addNano(System.nanoTime() - start);
        if(timeDelta < Long.MAX_VALUE)
            metric.colUpdateTimeDeltaHistogram.update(timeDelta);
    }
{code}

Considering it's calculating a mean, i don't know if perhaps a large sum might 
be overflowing? But that "if (timeDelta < Long.MAX_VALUE)" looks suspect, 
doesn't it?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to