[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519456#comment-14519456
 ] 

Ariel Weisberg commented on CASSANDRA-7486:
-------------------------------------------

For the C* we ship today we should evaluate whether G1 is better. For the 
platonic ideal C* (where the heap only needs to be 1 or 2 gig) I suspect we 
should ship CMS because I found it has lower baseline pause times for young gen 
collections especially on server class hardware.

This is something that we can do in a data driven way. [[email protected]] 
got some good data, but when I sampled throughput (from the spreadsheet) on 
some workloads like the 12g CMS and G1 I saw more throughput under CMS. I think 
we should munge the data a bit and visualize throughput and P99 (or P99.9). I 
am also not a fan of basing the decision off of that # of cores and a non-NUMA 
machine which is not representative of the hardware people use.

I am not comfortable with the measurements for large heaps because if I am 
reading correctly there was pretty  never an old generation collection under 
the workload I looked at. The old gen was growing but never reached the point 
it needed to do an old gen GC. It's great the server can run that long with so 
little promotion (TIL that is a thing that happens). That explains the very 
long young gen pauses. Lots of survivor copying I guess when I look at the size 
of survivor set vs pause time. I saw young gen pauses in the 400+ millisecond 
range under both collectors.

Another behavior to consider is worst case pause time when there is 
fragmentation.

With all the overhead of survivor copying I start to wonder if a valid strategy 
would be to allow promotion and let the concurrent collector run all the time. 
That would bring down young-gen GC pauses in exchange for throughput.

I think whether 8099 means no off-heap memtables in 3.0 is also a factor. If G1 
scales to larger heaps and larger on heap memtables then it will be a better 
choice.

> Compare CMS and G1 pause times
> ------------------------------
>
>                 Key: CASSANDRA-7486
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
>             Project: Cassandra
>          Issue Type: Test
>          Components: Config
>            Reporter: Jonathan Ellis
>            Assignee: Shawn Kumar
>             Fix For: 2.1.5
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to