[
https://issues.apache.org/jira/browse/CASSANDRA-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edward Ribeiro updated CASSANDRA-11823:
---------------------------------------
Attachment: CASSANDRA-11823.patch
Hi [~ostefano] and [~Stefania],
I took a stab at this issue, and I guess I've found the root cause of the
problem. I am providing a patch for cassandra-3.0 branch.
*IMHO*, it looks like when a table is created, the metrics Set for a specific
key entry at {{TableMetrics.allTableMetrics}} is updated while the metrics
{{Set}} is being iterated to get a summarized value to be passed to
{{GraphiteReporter}}, as below, for example:
{code}
public Long getValue()
{
long total = 0;
for (Metric cfGauge : allTableMetrics.get(name))
{
total = total + ((Gauge<? extends Number>)
cfGauge).getValue().longValue();
}
return total;
}
{code}
Even tough {{allTableMetrics}} is a thread-safe {{ConcurrentMap}}, *the {{Set}}
iterated in the for-loop above is not!* Oddly enough, the
{{ConcurrentModificationException}} reports the {{Map}} as the offending one
instead of the {{Set}} inside the {{Map}} that's effectively being iterated (I
guess that is is due to the nature of the for-each loop).
*If this is the case*, the solution is to create a thread-safe {{Set}}.
{{Collections#synchronizedSet}} will not work, but fortunately, we can also
create a thread-safe {{Set}} backed by a {{ConcurrentHashMap}}.
Until Java 8, we could do this as shown here:
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#newSetFromMap%28java.util.Map%29
But as C* uses Java 8 this can be done as here:
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#newKeySet--
Of course, I can be chasing my own tail (would not the first time, lol) and the
problem has *nothing* to do with I exposed above, so, please, let me know what
you think. :)
> Creating a table leads to a race with GraphiteReporter
> ------------------------------------------------------
>
> Key: CASSANDRA-11823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11823
> Project: Cassandra
> Issue Type: Bug
> Reporter: Stefano Ortolani
> Priority: Minor
> Labels: lhf
> Attachments: CASSANDRA-11823.patch
>
>
> Happened only on 3/4 nodes out of 13.
> {code:xml}
> INFO [MigrationStage:1] 2016-05-18 00:34:11,566 ColumnFamilyStore.java:381 -
> Initializing schema.table
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-18 00:34:11,569
> ScheduledReporter.java:119 - RuntimeException thrown from
> GraphiteReporter#report. Exception was suppressed.
> java.util.ConcurrentModificationException: null
> at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429)
> ~[na:1.8.0_91]
> at java.util.HashMap$KeyIterator.next(HashMap.java:1453) ~[na:1.8.0_91]
> at
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:690)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
> ~[metrics-graphite-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
> ~[metrics-graphite-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
> ~[metrics-core-3.1.0.jar:3.1.0]
> at
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117)
> ~[metrics-core-3.1.0.jar:3.1.0]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_91]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_91]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_91]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_91]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_91]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_91]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)