It sounds like its probably GC. Grep for GC in system.log to verify. If it is GC, there are a myriad of issues that could cause it, but at least you’ve narrowed it down.
On Sep 9, 2015, at 11:05 PM, Roman Tkachenko <ro...@mailgunhq.com> wrote: > Hey guys, > > We've been having issues in the past couple of days with CPU usage / load > average suddenly skyrocketing on some nodes of the cluster, affecting > performance significantly so majority of requests start timing out. It can go > on for several hours, with CPU spiking through the roof then coming back down > to norm and so on. Weirdly, it affects only a subset of nodes and it's always > the same ones. The boxes Cassandra is running on are pretty beefy, 24 cores, > and these CPU spikes go up to >1000%. > > What is the best way to debug such kind of issues and find out what Cassandra > is doing during spikes like this? Doesn't seem to be compaction related as > sometimes during these spikes "nodetool compactionstats" says no compactions > are running. > > Thanks! >