[jira] Commented: (CASSANDRA-2054) Cpu Spike to > 100%.

Erik Onnen (JIRA) Wed, 26 Jan 2011 09:57:08 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987117#action_12987117
 ]


Erik Onnen commented on CASSANDRA-2054:
---------------------------------------

Sorry to but in.  Often, a kill -3 will still work when jstack does not. The 
output goes to stdout with this approach so it can be harder to locate but IME 
it's often more reliable.

I'm curious, does this always happen on the same node? Even in "normal" state 
there are over 100 threads waiting on the JRE to do basic slab allocation 
tasks. Stacks like these are all over the thread dump:

"pool-1-thread-1523" prio=10 tid=0x00007f44ec6e7000 nid=0x366c runnable 
[0x00007f4274ee8000]
   java.lang.Thread.State: RUNNABLE
        at 
org.cliffc.high_scale_lib.NonBlockingHashMap.initialize(NonBlockingHashMap.java:259)
        at org.cliffc.high_scale_lib.NonBlockingHashMap. 
(NonBlockingHashMap.java:250)
        at org.cliffc.high_scale_lib.NonBlockingHashMap. 
(NonBlockingHashMap.java:243)
        at org.cliffc.high_scale_lib.NonBlockingHashSet. 
(NonBlockingHashSet.java:26)
        at 
org.apache.cassandra.net.MessagingService.putTarget(MessagingService.java:274)
        at 
org.apache.cassandra.net.MessagingService.sendRR(MessagingService.java:336)
        at 
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:381)
        at org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:315)
        at 
org.apache.cassandra.thrift.CassandraServer.readColumnFamily(CassandraServer.java:98)
        at 
org.apache.cassandra.thrift.CassandraServer.get(CassandraServer.java:289)
        at 
org.apache.cassandra.thrift.Cassandra$Processor$get.process(Cassandra.java:2655)
        at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
        at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

That particular code is just allocating an array, should be very fast and I 
wouldn't expect to see > 100 threads waiting on memory allocation unless 
there's a problem in the underlying host (bad RAM) or JVM. 

> Cpu Spike to > 100%. 
> ---------------------
>
>                 Key: CASSANDRA-2054
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2054
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Thibaut
>         Attachments: gc.log, jstack.txt, jstackerror.txt
>
>
> I see sudden spikes of cpu usage where cassandra will take up an enormous 
> amount of cpu (uptime load > 1000). 
> My application executes both reads and writes.
> I tested this with 
> https://hudson.apache.org/hudson/job/Cassandra-0.7/193/artifact/cassandra/build/apache-cassandra-2011-01-24_06-01-26-bin.tar.gz.
> I disabled JNA, but this didn't help.
> Jstack won't work anymore when this happens:
> -bash-4.1# jstack 27699 > /tmp/jstackerror
> 27699: Unable to open socket file: target process not responding or HotSpot 
> VM not loaded
> The -F option can be used when the target process is not responding
> Also, my entire application comes to a halt as long as the node is in this 
> state, as the node is still marked as up, but won't respond (cassandra is 
> taking up all the cpu on the first node) to any requests.
> /software/cassandra/bin/nodetool -h localhost ring
> Address Status State Load Owns Token
> ffffffffffffffff
> 192.168.0.1 Up Normal 3.48 GB 5.00% 0cc
> 192.168.0.2 Up Normal 3.48 GB 5.00% 199
> 192.168.0.3 Up Normal 3.67 GB 5.00% 266
> 192.168.0.4 Up Normal 2.55 GB 5.00% 333
> 192.168.0.5 Up Normal 2.58 GB 5.00% 400
> 192.168.0.6 Up Normal 2.54 GB 5.00% 4cc
> 192.168.0.7 Up Normal 2.59 GB 5.00% 599
> 192.168.0.8 Up Normal 2.58 GB 5.00% 666
> 192.168.0.9 Up Normal 2.33 GB 5.00% 733
> 192.168.0.10 Down Normal 2.39 GB 5.00% 7ff
> 192.168.0.11 Up Normal 2.4 GB 5.00% 8cc
> 192.168.0.12 Up Normal 2.74 GB 5.00% 999
> 192.168.0.13 Up Normal 3.17 GB 5.00% a66
> 192.168.0.14 Up Normal 3.25 GB 5.00% b33
> 192.168.0.15 Up Normal 3.01 GB 5.00% c00
> 192.168.0.16 Up Normal 2.48 GB 5.00% ccc
> 192.168.0.17 Up Normal 2.41 GB 5.00% d99
> 192.168.0.18 Up Normal 2.3 GB 5.00% e66
> 192.168.0.19 Up Normal 2.27 GB 5.00% f33
> 192.168.0.20 Up Normal 2.32 GB 5.00% ffffffffffffffff
> The interesting part is that after a while (seconds or minutes), I have seen 
> cassandra nodes return to a normal state again (without restart). I have also 
> never seen this happen at 2 nodes at the same time in the cluster (the node 
> where it happens differes, but there seems to be scheme for it to happen on 
> the first node most of the times).
> In the above case, I restarted node 192.168.0.10 and the first node returned 
> to normal state. (I don't know if there is a correlation)
> I attached the jstack of the node in trouble (as soon as I could access it 
> with jstack, but I suspect this is the jstack when the node was running 
> normal again).
> The heap usage is still moderate:
> /software/cassandra/bin/nodetool -h localhost info
> 0cc
> Gossip active    : true
> Load             : 3.49 GB
> Generation No    : 1295949691
> Uptime (seconds) : 42843
> Heap Memory (MB) : 1570.58 / 3005.38
> I will enable the GC logging tomorrow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-2054) Cpu Spike to > 100%.

Reply via email to