[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

Rick Branson (JIRA) Wed, 02 Jul 2014 13:36:46 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050677#comment-14050677
 ]


Rick Branson commented on CASSANDRA-7486:
-----------------------------------------

Mad anecdotes:

We ran with G1 enabled for around 4 days in a 33-node cluster running 1.2.17 on 
JDK7u45 that has around a 1:5 read:write ratio. We tried a few different 
configurations with short durations, but most of the time we ran it with the 
out-of-the-box G1 configuration on a 20G heap and 32 parallel GC threads (16 
core, 32 hyperthreaded). There were some somewhat scary bugs fixed in 7u60 that 
ultimately caused me to roll back to the CMS collector after the experiment.

* The experiment pointed out that our young gen was basically too small and was 
pulling latency up significantly. When we returned back to CMS, I doubled new 
size from 800M -> 1600M. We had moved to new hardware and hadn't taken the time 
to sit down and play with GC settings. This cut our mean latency dramatically 
as perceived from the client, ~50% for writes and ~30% for reads, similar to 
what we saw with G1. I was quite thrilled with this result.
* I tried both 100ms and 150ms pause times targets with 12G, 16G, and 20G 
heaps, and while these resulted in slightly lower mean latency (~5-10%), Mixed 
GC activity caused P99s to suffer greatly. There's compelling evidence that the 
200ms default is nearly ideal for the way the G1 algorithm works in its current 
incarnation.
* We basically needed a 20G heap to make G1 work well for us, since by default 
G1 will use up to half of the max heap for eden space and Cassandra needs quite 
a large old gen to stay happy. G1 appears to need a much larger eden space to 
work efficiently, sizes that would make ParNew die in a fire. GCs of the eden 
space were impressively fast, with a ~10G eden space taking ~120ms on average 
to collect.
* G1's huge eden space was helpful working around some issues with compaction 
on hints CF which had dozens of very wide partitions, hundreds of thousands of 
cells each.
* Overall, at the default 200ms pause time target, we didn't see much of an 
increase in CPU usage over CMS.

In the end, my tests basically told us that G1 requires a larger heap to get 
the same results with *far* less tuning. If there are GC issues, it seems like 
in the vast majority of cases G1 can either eliminate them or G1 makes it easy 
to just workaround them by cranking up the heap size. Someone should probably 
test G1 with a variable-sized heap since it's designed to give back RAM when it 
thinks it doesn't need it. That might or might not actually work. While we 
didn't test this, a configuration of G1 + heap size min of 1/8 RAM and max of 
1/2 RAM might make a really nice default for Cassandra at some point.

> Compare CMS and G1 pause times
> ------------------------------
>
>                 Key: CASSANDRA-7486
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
>             Project: Cassandra
>          Issue Type: Test
>          Components: Config
>            Reporter: Jonathan Ellis
>            Assignee: Ryan McGuire
>             Fix For: 2.1.0
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

Reply via email to