[jira] Commented: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Thibaut (JIRA) Tue, 25 Jan 2011 09:31:08 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986534#action_12986534
 ]


Thibaut commented on CASSANDRA-2037:
------------------------------------

Both. But there might be many requests during the first few seconds when I 
restart our application. The cluster has size 20.

I disabled JNA, but this didn't help. I still see sudden spikes where cassandra 
will take up an enormous amount of cpu (uptime load > 1000).

Jstack won't work anymore:

-bash-4.1# jstack 27699 > /tmp/jstackerror
27699: Unable to open socket file: target process not responding or HotSpot VM 
not loaded
The -F option can be used when the target process is not responding

Also, my entire application comes to a halt as the node is still marked as up, 
but won't respond (cassandra is taking up all the cpu on the first node)

/software/cassandra/bin/nodetool -h localhost ring
Address         Status State   Load            Owns    Token                    
                                         
                                                       ffffffffffffffff         
                                         
192.168.0.1     Up     Normal  3.48 GB         5.00%   0cc                      
                                         
192.168.0.2     Up     Normal  3.48 GB         5.00%   199                      
                                         
192.168.0.3     Up     Normal  3.67 GB         5.00%   266                      
                                         
192.168.0.4     Up     Normal  2.55 GB         5.00%   333                      
                                         
192.168.0.5     Up     Normal  2.58 GB         5.00%   400                      
                                         
192.168.0.6     Up     Normal  2.54 GB         5.00%   4cc                      
                                         
192.168.0.7     Up     Normal  2.59 GB         5.00%   599                      
                                         
192.168.0.8     Up     Normal  2.58 GB         5.00%   666                      
                                         
192.168.0.9     Up     Normal  2.33 GB         5.00%   733                      
                                         
192.168.0.10    Down   Normal  2.39 GB         5.00%   7ff                      
                                         
192.168.0.11    Up     Normal  2.4 GB          5.00%   8cc                      
                                         
192.168.0.12    Up     Normal  2.74 GB         5.00%   999                      
                                         
192.168.0.13    Up     Normal  3.17 GB         5.00%   a66                      
                                         
192.168.0.14    Up     Normal  3.25 GB         5.00%   b33                      
                                         
192.168.0.15    Up     Normal  3.01 GB         5.00%   c00                      
                                         
192.168.0.16    Up     Normal  2.48 GB         5.00%   ccc                      
                                         
192.168.0.17    Up     Normal  2.41 GB         5.00%   d99                      
                                         
192.168.0.18    Up     Normal  2.3 GB          5.00%   e66                      
                                         
192.168.0.19    Up     Normal  2.27 GB         5.00%   f33                      
                                         
192.168.0.20    Up     Normal  2.32 GB         5.00%   ffffffffffffffff  


The interesting part is that after a while (seconds or minutes), I have seen 
cassandra nodes return to a normal state again (without restart). I have also 
never seen this happen at 2 nodes at the same time in the cluster (the node 
where it happens differes, but there seems to be scheme for it to happen on the 
first node most of the times).

In the above case, I restarted node 192.168.0.10  and the first node returned 
to normal state. (I don't know if there is a correlation)

I attached the jstack of the node in trouble (as soon as I could access it with 
jstack, but I suspect this is the jstack when the node was running normal 
again).





> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
>                 Key: CASSANDRA-2037
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Erik Onnen
>            Priority: Critical
>
> MessagingSerice is a system singleton with a static Multimap field targets. 
> Multimaps are not thread safe but no attempt is made to synchronize access to 
> that field. Multimap ultimately uses the standard java HashMap which is 
> susceptible to a race condition where threads will get stuck during a get 
> operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable 
> [0x00007fa52fde4000]
>    java.lang.Thread.State: RUNNABLE
>       at java.util.HashMap.get(HashMap.java:303)
>       at 
> com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
>       at 
> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
>       at 
> com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
>       at 
> com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
>       at 
> org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
>       at 
> org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
>       at 
> org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
>       at 
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
>       at 
> org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
>       at 
> org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
>       at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
>       at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:662)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2037) Unsafe Multimap Access in MessagingService

Reply via email to