Jeff Griffith created CASSANDRA-11751: -----------------------------------------
Summary: Histogram overflow in metrics Key: CASSANDRA-11751 URL: https://issues.apache.org/jira/browse/CASSANDRA-11751 Project: Cassandra Issue Type: Bug Components: Core Environment: Cassandra 2.2.6 on Linux Reporter: Jeff Griffith One particular histogram in the cassandra metrics seems to overflow preventing the calculation of the mean on the dropwizard "Snapshot". Here is the exception that comes from the metrics library: {code} java.lang.IllegalStateException: Unable to compute ceiling for max when histogram overflowed at org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:232) ~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT] at org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103) ~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT] at com.addthis.metrics3.reporter.config.SplunkReporter.reportHistogram(SplunkReporter.java:155) ~[reporter-config3-3.0.0.jar:3.0.0] at com.addthis.metrics3.reporter.config.SplunkReporter.report(SplunkReporter.java:101) ~[reporter-config3-3.0.0.jar:3.0.0] at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0] at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_72] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_72] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_72] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] {code} On deeper analysis, it seems like this is happening specifically on this metric: {code} ColUpdateTimeDeltaHistogram {code} I think this is where it is updated in ColumnFamilyStore.java {code} public void apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition replayPosition) { long start = System.nanoTime(); Memtable mt = data.getMemtableFor(opGroup, replayPosition); final long timeDelta = mt.put(key, columnFamily, indexer, opGroup); maybeUpdateRowCache(key); metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1); metric.writeLatency.addNano(System.nanoTime() - start); if(timeDelta < Long.MAX_VALUE) metric.colUpdateTimeDeltaHistogram.update(timeDelta); } {code} Considering it's calculating a mean, i don't know if perhaps a large sum might be overflowing? But that "if (timeDelta < Long.MAX_VALUE)" looks suspect, doesn't it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)