Jeff Griffith created CASSANDRA-11751:
-----------------------------------------
Summary: Histogram overflow in metrics
Key: CASSANDRA-11751
URL: https://issues.apache.org/jira/browse/CASSANDRA-11751
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Cassandra 2.2.6 on Linux
Reporter: Jeff Griffith
One particular histogram in the cassandra metrics seems to overflow preventing
the calculation of the mean on the dropwizard "Snapshot". Here is the exception
that comes from the metrics library:
{code}
java.lang.IllegalStateException: Unable to compute ceiling for max when
histogram overflowed
at
org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:232)
~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT]
at
org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
~[apache-cassandra-2.2.6.jar:2.2.6-SNAPSHOT]
at
com.addthis.metrics3.reporter.config.SplunkReporter.reportHistogram(SplunkReporter.java:155)
~[reporter-config3-3.0.0.jar:3.0.0]
at
com.addthis.metrics3.reporter.config.SplunkReporter.report(SplunkReporter.java:101)
~[reporter-config3-3.0.0.jar:3.0.0]
at
com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
~[metrics-core-3.1.0.jar:3.1.0]
at
com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
~[metrics-core-3.1.0.jar:3.1.0]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_72]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[na:1.8.0_72]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_72]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_72]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_72]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
{code}
On deeper analysis, it seems like this is happening specifically on this metric:
{code}
ColUpdateTimeDeltaHistogram
{code}
I think this is where it is updated in ColumnFamilyStore.java
{code}
public void apply(DecoratedKey key, ColumnFamily columnFamily,
SecondaryIndexManager.Updater indexer, OpOrder.Group opGroup, ReplayPosition
replayPosition)
{
long start = System.nanoTime();
Memtable mt = data.getMemtableFor(opGroup, replayPosition);
final long timeDelta = mt.put(key, columnFamily, indexer, opGroup);
maybeUpdateRowCache(key);
metric.samplers.get(Sampler.WRITES).addSample(key.getKey(),
key.hashCode(), 1);
metric.writeLatency.addNano(System.nanoTime() - start);
if(timeDelta < Long.MAX_VALUE)
metric.colUpdateTimeDeltaHistogram.update(timeDelta);
}
{code}
Considering it's calculating a mean, i don't know if perhaps a large sum might
be overflowing? But that "if (timeDelta < Long.MAX_VALUE)" looks suspect,
doesn't it?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)