Hello,

More than 30 plus Cassandra servers in the primary DC went down OOM exception 
below. What puzzles me is the scale at which it happened (at the same minute). 
I will share some more details below.

System Log: http://pastebin.com/iPeYrWVR
GC Log: http://pastebin.com/CzNNGs0r

<http://pastebin.com/CzNNGs0r>During the OOM I saw lot of WARNings like the 
below (these were there for quite sometime may be weeks)
WARN  [SharedPool-Worker-81] 2017-03-01 19:55:41,209 BatchStatement.java:252 - 
Batch of prepared statements for [keyspace.table] is of size 225455, exceeding 
specified threshold of 65536 by 159919.

Environment:
We are using ApacheCassandra-2.1.9 on Multi DC cluster. Primary DC (more C* 
nodes on SSD and apps run here)  and secondary DC (geographically remote and 
more like a DR to primary) on SAS drives.
Cassandra config:

Java 1.8.0_65
Garbage Collector: G1GC
memtable_allocation_type: offheap_objects

Post this OOM I am seeing huge hints pile up on majority of the nodes and the 
pending hints keep going up. I have increased HintedHandoff CoreThreads to 6 
but that did not help (I admit that I tried this on one node to try).

nodetool compactionstats -H
pending tasks: 3
compaction type            keyspace                          table   completed  
    total    unit   progress
        Compaction              system                          hints     28.5 
GB   92.38 GB   bytes     30.85%


Appreciate your inputs here.

Thanks,
Shravan

Reply via email to