Re: Follow-up post on cassandra configuration with some experiments on GC tuning

Peter Schuller Mon, 30 Aug 2010 15:19:18 -0700

> collection runs for the cases tested. In most cases, I prefer having low
> pauses due to any garbage collection runs and don't care too much about the
> shape of the memory usage, and I guess, that's the reason why the low pause
> collector is used by default for running cassandra. For myself, I have mixed
> feelings regarding the low pause collector, because I found it difficult to
> find good young generation sizings, which are suitable to different load
> patterns. Therefor I mostly prefer the throughput collector, which
> adaptively sizes the young generation, doing a good job to avoiding that too
> much data goes to the tenured generation.


Well, if you care about pause times, usually the best bet would be to
have the young gen be as large as possible to yield what you consider
to be a pause time within an acceptable range. I.e., as large as
acceptable but no larger.

> I would be interested in, what are
> the differences concerning the stop times between the different GC variants,
> when running cassandra. Is it really much better to use the low pause
> collector in regard to get stabile response times, even if I use
> XX:+UseParallelOldGC and XX:MaxGCPauseMillis=nnn flags? Any experiences with
> this?

If you use the default (for the JVM, not for cassandra) throughput
collector, you *will* take full stop-the-world collections, period.
You can enable parallel GC, but with that collector there's no way
around the fact that full collections will pause the application for
the full duration of such full GC:s. In general, the larger the heap
(relative to speed of the collection), the more of a problem this will
be. If you deem the pause times acceptable for your particular
use-case, I don't see an obvious reason to prefer the CMS collector.

MaxGCPauseMillis won't help; the throughput collector just doesn't
have any way to ader to it. A full GC is a full GC.

For CMS, I'm not sure what, if any, effect the MaxGCPauseMillis has.
In my very limited testing I didn't see any obvious effect on e.g.
sizing choice for the young generation (but I have not checked the
code to see if CMS uses it).

It is definitely used by the G1 collector; typically MaxGCPauseMillis
and GCPauseIntervalMillis are the two most importants settings to
tweak. They are directly used to decide the young generation size, as
well as limit the number of non-young regions that are picked for GC
during a partial (not young-only) GC.

Has anyone run Cassandra with G1 in production for prolonged periods
of time? One thing that concerns me is the reliance on GC to remove
obsolete SS tables. That relies on certain GC behavior that is true
for CMS and the throughput collector, but not with G1. With CMS, an
unreachable sstable will be detected when concurrent mark/sweep
triggers; but with G1, there is not necessarily any expectation at all
that some particular region that happens to contain the reference in
question will be collected - *ever* - since G1 always picks the "best"
regions first (best in terms of "bang for the buck" - the most memory
reclaimed at the lowest cost).

-- 
/ Peter Schuller

Re: Follow-up post on cassandra configuration with some experiments on GC tuning

Reply via email to