Load spikes on coordinators since upgrade from 0.6.8 to 0.7
-----------------------------------------------------------
Key: CASSANDRA-2357
URL: https://issues.apache.org/jira/browse/CASSANDRA-2357
Project: Cassandra
Issue Type: Bug
Affects Versions: 0.7.4
Reporter: Jason Harvey
Attachments: thread_dump.txt
Since our move from 0.6.8 to 0.7, all of the nodes which speak with clients
have been having periodic, abrupt load spikes going into the hundreds. We have
been seeing these load spikes 1 to 2 times per hour on every node which clients
are speaking with. The load graph for a typical spike:
http://i.imgur.com/jY8AV.png
I have verified that client connections are not spiking at the same time via
TCP statistics. I have also verified that we aren't seeing any spikes in
reads/mutations/etc.
We were using the DynamicSnitch, but I turned that off as a troubleshooting
step. The issue was unchanged.
When the spikes occur, the box's responsiveness slows to a crawl so I am unable
to do much in terms of real-time diagnostics. I was able to get a thread dump a
few seconds after a spike, which I have attached to this ticket. Not sure if it
will show anything since I couldn't capture it immediately during the spike.
I should note that David King noticed a similar problem (#2058) when he tried
moving us from 0.6.8 to 0.6.10. The main issue at the time was a long-lasting
load spike, but he also saw occasional abrupt load spikes like we are seeing
now. When we moved back to 0.6.8, we didn't see the problem again, until the
move to 0.7.
I realize this information is somewhat nebulous. If there is any further info I
can provide, please let me know. The spikes are causing quite a bit of
instability, so we are considering retreating back to 0.6.8. I'd like to
investigate every possible solution before we resort to that.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira