I recommend reading through
https://issues.apache.org/jira/browse/CASSANDRA-8150 to get an idea of how
the JVM GC works and what you can do to tune it.  Also good is Blake
Eggleston's writeup which can be found here:
http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html

I'd like to note that allocating 4GB heap to Cassandra under any serious
workload is unlikely to be sufficient.


On Thu Dec 04 2014 at 8:43:38 PM Philo Yang <ud1...@gmail.com> wrote:

> I have two kinds of machine:
> 16G RAM, with default heap size setting, about 4G.
> 64G RAM, with default heap size setting, about 8G.
>
> These two kinds of nodes have same number of vnodes, and both of them have
> gc issue, although the node of 16G have a higher probability  of gc issue.
>
> Thanks,
> Philo Yang
>
>
> 2014-12-05 12:34 GMT+08:00 Tim Heckman <t...@pagerduty.com>:
>
>> On Dec 4, 2014 8:14 PM, "Philo Yang" <ud1...@gmail.com> wrote:
>> >
>> > Hi,all
>> >
>> > I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with
>> full gc that sometime there may be one or two nodes full gc more than one
>> time per minute and over 10 seconds each time, then the node will be
>> unreachable and the latency of cluster will be increased.
>> >
>> > I grep the GCInspector's log, I found when the node is running fine
>> without gc trouble there are two kinds of gc:
>> > ParNew GC in less than 300ms which clear the Par Eden Space and
>> enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
>> more than 200ms, there is only a small number of ParNew GC in log)
>> > ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
>> enlarge Par Eden Space little, each 1-2 hours it will be executed once.
>> >
>> > However, sometimes ConcurrentMarkSweep will be strange like it shows:
>> >
>> > INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 -> 3579838464;
>> Par Eden Space: 503316480 -> 294794576; Par Survivor Space: 62914528 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 -> 3579836512;
>> Par Eden Space: 503316480 -> 310562032; Par Survivor Space: 62872496 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 -> 3579805792;
>> Par Eden Space: 503316480 -> 332391096; Par Survivor Space: 62914544 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 -> 3579829760;
>> Par Eden Space: 503316480 -> 351991456; Par Survivor Space: 62914552 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 -> 3579799752;
>> Par Eden Space: 503316480 -> 366222584; Par Survivor Space: 62914560 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 -> 3579817392;
>> Par Eden Space: 503316480 -> 388702928; Par Survivor Space: 62914552 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 -> 3579838424;
>> Par Eden Space: 503316480 -> 408992784; Par Survivor Space: 62896720 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 -> 3579816424;
>> Par Eden Space: 503316480 -> 438633608; Par Survivor Space: 62914544 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 -> 3579785496;
>> Par Eden Space: 503316480 -> 441354856; Par Survivor Space: 62889528 -> 0
>> > INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
>> ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 -> 3579814464;
>> Par Eden Space: 503316480 -> 448782440; Par Survivor Space: 62914560 -> 0
>> >
>> > In each time Old Gen reduce only a little, Survivor Space will be clear
>> but the heap is still full so there will be another full gc very soon then
>> the node will down. If I restart the node, it will be fine without gc
>> trouble.
>> >
>> > Can anyone help me to find out where is the problem that full gc can't
>> reduce CMS Old Gen? Is it because there are too many objects in heap can't
>> be recycled? I think review the table scheme designing and add new nodes
>> into cluster is a good idea, but I still want to know if there is any other
>> reason causing this trouble.
>>
>> How much total system memory do you have? How much is allocated for heap
>> usage? How big is your working data set?
>>
>> The reason I ask is that I've seen problems with lots of GC with no room
>> gained, and it was memory pressure. Not enough for the heap. We decided
>> that just increasing the heap size was a bad idea, as we did rely on free
>> RAM being used for filesystem caching. So some vertical and horizontal
>> scaling allowed us to give Cass more heap space, as well as distribute the
>> workload to try and avoid further problems.
>>
>> > Thanks,
>> > Philo Yang
>>
>> Cheers!
>> -Tim
>>
>
>

Reply via email to