I neglected to mention, I also adjust the oom score of cassandra, to tell the kernel to kill something else other than cassandra. (Like if one of your dev’s runs a script that uses a lot of memory, so it kills your dev’s script instead).
http://lwn.net/Articles/317814/ <http://lwn.net/Articles/317814/> > On 19 Feb 2015, at 5:28 am, Michał Łowicki <mlowi...@gmail.com> wrote: > > Hi, > > Couple of times a day 2 out of 4 members cluster nodes are killed > > root@db4:~# dmesg | grep -i oom > [4811135.792657] [ pid ] uid tgid total_vm rss cpu oom_adj > oom_score_adj name > [6559049.307293] java invoked oom-killer: gfp_mask=0x201da, order=0, > oom_adj=0, oom_score_adj=0 > > Nodes are using 8GB heap (confirmed with *nodetool info*) and aren't using > row cache. > > Noticed that couple of times a day used RSS is growing really fast within > couple of minutes and I see CPU spikes at the same time - > https://www.dropbox.com/s/khco2kdp4qdzjit/Screenshot%202015-02-18%2015.10.54.png?dl=0 > > <https://www.dropbox.com/s/khco2kdp4qdzjit/Screenshot%202015-02-18%2015.10.54.png?dl=0>. > > Could be related to compaction but after compaction is finished used RSS > doesn't shrink. Output from pmap when C* process uses 50GB RAM (out of 64GB) > is available on http://paste.ofcode.org/ZjLUA2dYVuKvJHAk9T3Hjb > <http://paste.ofcode.org/ZjLUA2dYVuKvJHAk9T3Hjb>. At the time dump was made > heap usage is far below 8GB (~3GB) but total RSS is ~50GB. > > Any help will be appreciated. > > -- > BR, > Michał Łowicki
smime.p7s
Description: S/MIME cryptographic signature