On 2017-07-25 12:49 (-0700), David Salz <da...@sandbox-interactive.com> wrote: 
> Hi,
> 
> has anyone seen the following exception before?
> 
> Context:
> 
> * Cassandra 3.9,
> 
> * single node (20 Cores / 256 GB RAM)
> 

A single node with 20 cores and 256GB of RAM is probably not going to be the 
best choice - while it's a great machine, the default cassandra config really 
isn't tuned for that # of cores or that much RAM (it'll almost all be left for 
page cache, which is great for reads, and less great for write heavy 
workloads). What sort of heap settings are you using? 


> * doing lots of counter mutations
> 
> * Whenever this exception happens, CPU spikes, node becomes unresponsive
> for a few minutes. Eventually, the node will "die", i.e. become
> completely unresponsive. Restarting the node fixes it... until the next
> time :(
> 

You're getting timeouts on a single node cluster, which usually means you're in 
a GC spin a thread deadlocked or a thread pool backed up or similar. Seeing 
'nodetool tpstats' may be a starting point. Knowing whether the node stops 
processing all data at this time, or just some of it, would also help. You'd 
want to take a look for indications of a GC pause (GCInspector log lines, or 
even better actual GC logs), and if that doesn't work, jstack output thrown 
onto pastebin or gist or similar.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to