[ 
https://issues.apache.org/jira/browse/CASSANDRA-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Ribeiro updated CASSANDRA-11823:
---------------------------------------
    Attachment: CASSANDRA-11823.patch

Hi [~ostefano] and [~Stefania], 

I took a stab at this issue, and I guess I've found the root cause of the 
problem. I am providing a patch for cassandra-3.0 branch.

*IMHO*, it looks like when a table is created, the metrics Set for a specific 
key entry at {{TableMetrics.allTableMetrics}} is updated while the metrics 
{{Set}} is being iterated to get a summarized value to be passed to 
{{GraphiteReporter}}, as below, for example:

{code}
            public Long getValue()
            {
                long total = 0;
                for (Metric cfGauge : allTableMetrics.get(name))
                {
                    total = total + ((Gauge<? extends Number>) 
cfGauge).getValue().longValue();
                }
                return total;
            }
{code}

Even tough {{allTableMetrics}} is a thread-safe {{ConcurrentMap}}, *the {{Set}} 
iterated in the for-loop above is not!* Oddly enough, the  
{{ConcurrentModificationException}} reports the {{Map}} as the offending one 
instead of the {{Set}} inside the {{Map}} that's effectively being iterated (I 
guess that is is due to the nature of the for-each loop).

*If this is the case*, the solution is to create a thread-safe {{Set}}.  
{{Collections#synchronizedSet}} will not work, but fortunately, we can also  
create a thread-safe {{Set}} backed by a {{ConcurrentHashMap}}.
Until Java 8, we could do this as shown here: 
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#newSetFromMap%28java.util.Map%29

But as C* uses Java 8 this can be done as here: 
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#newKeySet--

Of course, I can be chasing my own tail (would not the first time, lol) and the 
problem has *nothing* to do with I exposed above, so, please, let me know what 
you think. :)

> Creating a table leads to a race with GraphiteReporter
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11823
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefano Ortolani
>            Priority: Minor
>              Labels: lhf
>         Attachments: CASSANDRA-11823.patch
>
>
> Happened only on 3/4 nodes out of 13.
> {code:xml}
> INFO  [MigrationStage:1] 2016-05-18 00:34:11,566 ColumnFamilyStore.java:381 - 
> Initializing schema.table
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-18 00:34:11,569 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.util.ConcurrentModificationException: null
>       at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) 
> ~[na:1.8.0_91]
>       at java.util.HashMap$KeyIterator.next(HashMap.java:1453) ~[na:1.8.0_91]
>       at 
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:690) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>       at 
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_91]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_91]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_91]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to