This topic comes up quite a bit. Enough, in fact, that I've done a 1 hour webinar on the topic. I cover how the JVM GC works and things you need to consider when tuning it for Cassandra.
https://www.youtube.com/watch?v=7B_w6YDYSwA With your specific problem - full GC not reducing the old gen - the most obvious answer is "there's not much garbage to collect". Take a look at nodetool tpstats. Do you see lots of blocked MemtableFlushWriters? Jon On Thu Dec 18 2014 at 2:01:00 PM Y.Wong <yungmw...@gmail.com> wrote: > V > On Dec 4, 2014 11:14 PM, "Philo Yang" <ud1...@gmail.com> wrote: > >> Hi,all >> >> I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full >> gc that sometime there may be one or two nodes full gc more than one time >> per minute and over 10 seconds each time, then the node will be unreachable >> and the latency of cluster will be increased. >> >> I grep the GCInspector's log, I found when the node is running fine >> without gc trouble there are two kinds of gc: >> ParNew GC in less than 300ms which clear the Par Eden Space and >> enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in >> more than 200ms, there is only a small number of ParNew GC in log) >> ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and >> enlarge Par Eden Space little, each 1-2 hours it will be executed once. >> >> However, sometimes ConcurrentMarkSweep will be strange like it shows: >> >> INFO [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 12648ms. CMS Old Gen: 3579838424 -> 3579838464; >> Par Eden Space: 503316480 -> 294794576; Par Survivor Space: 62914528 -> 0 >> INFO [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 12227ms. CMS Old Gen: 3579838464 -> 3579836512; >> Par Eden Space: 503316480 -> 310562032; Par Survivor Space: 62872496 -> 0 >> INFO [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 11538ms. CMS Old Gen: 3579836688 -> 3579805792; >> Par Eden Space: 503316480 -> 332391096; Par Survivor Space: 62914544 -> 0 >> INFO [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 12180ms. CMS Old Gen: 3579835784 -> 3579829760; >> Par Eden Space: 503316480 -> 351991456; Par Survivor Space: 62914552 -> 0 >> INFO [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 10574ms. CMS Old Gen: 3579838112 -> 3579799752; >> Par Eden Space: 503316480 -> 366222584; Par Survivor Space: 62914560 -> 0 >> INFO [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 11594ms. CMS Old Gen: 3579831424 -> 3579817392; >> Par Eden Space: 503316480 -> 388702928; Par Survivor Space: 62914552 -> 0 >> INFO [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 11463ms. CMS Old Gen: 3579817392 -> 3579838424; >> Par Eden Space: 503316480 -> 408992784; Par Survivor Space: 62896720 -> 0 >> INFO [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 9576ms. CMS Old Gen: 3579838424 -> 3579816424; >> Par Eden Space: 503316480 -> 438633608; Par Survivor Space: 62914544 -> 0 >> INFO [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 11556ms. CMS Old Gen: 3579816424 -> 3579785496; >> Par Eden Space: 503316480 -> 441354856; Par Survivor Space: 62889528 -> 0 >> INFO [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 - >> ConcurrentMarkSweep GC in 12082ms. CMS Old Gen: 3579786592 -> 3579814464; >> Par Eden Space: 503316480 -> 448782440; Par Survivor Space: 62914560 -> 0 >> >> In each time Old Gen reduce only a little, Survivor Space will be clear >> but the heap is still full so there will be another full gc very soon then >> the node will down. If I restart the node, it will be fine without gc >> trouble. >> >> Can anyone help me to find out where is the problem that full gc can't >> reduce CMS Old Gen? Is it because there are too many objects in heap can't >> be recycled? I think review the table scheme designing and add new nodes >> into cluster is a good idea, but I still want to know if there is any other >> reason causing this trouble. >> >> Thanks, >> Philo Yang >> >>