Re: better anti OOM

Peter Schuller Mon, 26 Dec 2011 16:05:42 -0800

> I suggest you describe exactly what the problem is you have and why you
> think stopping compaction/repair is the appropriate solution.
>
> compacting 41.7 GB CF with about 200 millions rows adds - 600 MB to heap,
> node logs messages like:


I don't know what you are basing that on. It seems unlikely to me that
the working set of a compaction is 600 MB. However, it may very well
be that the allocation rate is such that it contributes to an
additional 600 MB average heap usage after a CMS phase has completed.

> After node boot
> Heap Memory (MB) : 1157.98 / 1985.00
>
> disabled gossip + thrift, only compaction running
> Heap Memory (MB) : 1981.00 / 1985.00

Using "nodetool info" to monitor heap usage is not really useful
unless done continuously over time and observing the free heap after
CMS phases have completed. Regardless, the heap is always expected to
grow in usage to the occupancy trigger which kick-starts CMS. That
said, 1981/1985 does indicate a non-desirable state for Cassandra, but
it does not mean that compaction is "using" 600 mb as such (in terms
of live set). You might say that it implies >= 600 mb extra heap
required at your current heap size and GC settings.

If you want to understand what's happening I suggest attaching with
visualvm/jconsole and looking at the GC behavior, and run with
-XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps. When attached with visualvm/jconsole you can
hit "perform gc" and see how far it drops, to judge what the actual
live set is.

Also, you say it's "pretty dead". What exactly does that mean? Does it
OOM? I suspect you're just seeing fallbacks to full GC and long pauses
because you're allocating and promoting to old-gen fast enough that
CMS is just not keeping up; rather than it having to do with memory
"use" per say.

In your case, I suspect you simply need to run with a bigger heap or
reconfigure CMS to use additional threads for concurrent marking
(-XX:ParallelCMSThreads=XXX - try XXX = number of CPU cores for
example in this case). Alternatively, a larger young gen to avoid so
much getting promoted during compaction.

But really, in short: The easiest fix is probably to increase the heap
size. I know this e-mail doesn't begin to explain details but it's
such a long story.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: better anti OOM

Reply via email to