[
https://issues.apache.org/jira/browse/CASSANDRA-2037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986534#action_12986534
]
Thibaut commented on CASSANDRA-2037:
------------------------------------
Both. But there might be many requests during the first few seconds when I
restart our application. The cluster has size 20.
I disabled JNA, but this didn't help. I still see sudden spikes where cassandra
will take up an enormous amount of cpu (uptime load > 1000).
Jstack won't work anymore:
-bash-4.1# jstack 27699 > /tmp/jstackerror
27699: Unable to open socket file: target process not responding or HotSpot VM
not loaded
The -F option can be used when the target process is not responding
Also, my entire application comes to a halt as the node is still marked as up,
but won't respond (cassandra is taking up all the cpu on the first node)
/software/cassandra/bin/nodetool -h localhost ring
Address Status State Load Owns Token
ffffffffffffffff
192.168.0.1 Up Normal 3.48 GB 5.00% 0cc
192.168.0.2 Up Normal 3.48 GB 5.00% 199
192.168.0.3 Up Normal 3.67 GB 5.00% 266
192.168.0.4 Up Normal 2.55 GB 5.00% 333
192.168.0.5 Up Normal 2.58 GB 5.00% 400
192.168.0.6 Up Normal 2.54 GB 5.00% 4cc
192.168.0.7 Up Normal 2.59 GB 5.00% 599
192.168.0.8 Up Normal 2.58 GB 5.00% 666
192.168.0.9 Up Normal 2.33 GB 5.00% 733
192.168.0.10 Down Normal 2.39 GB 5.00% 7ff
192.168.0.11 Up Normal 2.4 GB 5.00% 8cc
192.168.0.12 Up Normal 2.74 GB 5.00% 999
192.168.0.13 Up Normal 3.17 GB 5.00% a66
192.168.0.14 Up Normal 3.25 GB 5.00% b33
192.168.0.15 Up Normal 3.01 GB 5.00% c00
192.168.0.16 Up Normal 2.48 GB 5.00% ccc
192.168.0.17 Up Normal 2.41 GB 5.00% d99
192.168.0.18 Up Normal 2.3 GB 5.00% e66
192.168.0.19 Up Normal 2.27 GB 5.00% f33
192.168.0.20 Up Normal 2.32 GB 5.00% ffffffffffffffff
The interesting part is that after a while (seconds or minutes), I have seen
cassandra nodes return to a normal state again (without restart). I have also
never seen this happen at 2 nodes at the same time in the cluster (the node
where it happens differes, but there seems to be scheme for it to happen on the
first node most of the times).
In the above case, I restarted node 192.168.0.10 and the first node returned
to normal state. (I don't know if there is a correlation)
I attached the jstack of the node in trouble (as soon as I could access it with
jstack, but I suspect this is the jstack when the node was running normal
again).
> Unsafe Multimap Access in MessagingService
> ------------------------------------------
>
> Key: CASSANDRA-2037
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2037
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.7.0
> Reporter: Erik Onnen
> Priority: Critical
>
> MessagingSerice is a system singleton with a static Multimap field targets.
> Multimaps are not thread safe but no attempt is made to synchronize access to
> that field. Multimap ultimately uses the standard java HashMap which is
> susceptible to a race condition where threads will get stuck during a get
> operation yielding multiple threads similar to the following stack:
> "pool-1-thread-6451" prio=10 tid=0x00007fa5242c9000 nid=0x10f4 runnable
> [0x00007fa52fde4000]
> java.lang.Thread.State: RUNNABLE
> at java.util.HashMap.get(HashMap.java:303)
> at
> com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:205)
> at
> com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:194)
> at
> com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:72)
> at
> com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:60)
> at
> org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:303)
> at
> org.apache.cassandra.service.StorageProxy.strongRead(StorageProxy.java:353)
> at
> org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:229)
> at
> org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
> at
> org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
> at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.