[ 
https://issues.apache.org/jira/browse/CASSANDRA-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999895#comment-14999895
 ] 

Andy Caldwell commented on CASSANDRA-9805:
------------------------------------------

Further investigation (see the discussion in 
https://github.com/Metaswitch/clearwater-cassandra/issues/42) suggests you are 
correct, and this is a CMS collection.  Unfortunately, CMS is not very 
concurrent on a single-core machine (since the GC thread takes over a core 
during CMS and there's no other core to handle service load until the GC 
completes) so the GC is basically an outage.  As I previously mentioned, in 
1.2.12 we didn't see the steadily climbing memory usage, we only see it in 2.0 
or 2.1 installs.  This is making monitoring our Cassandra deployment more 
difficult as we can't afford to use nodetool.

According to https://wiki.apache.org/cassandra/CassandraHardware, you recommend 
running Cassandra on large machines (8-cores is your "sweet spot" for hardware) 
which would side step this problem mostly but you also recommend running on 
large EC2 instances which have only 2 cores 
(https://aws.amazon.com/ec2/instance-types/) and so will halve in capacity 
during a CMS.  Do you have any advice on running Cassandra on machines with 
limited cores such that we can avoid/mitigate the costs of the garbage 
collector?

> nodetool status causes garbage to be accrued
> --------------------------------------------
>
>                 Key: CASSANDRA-9805
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9805
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>         Environment: Ubuntu 14.04 64-bit
> Cassandra 2.0.14
> Java 1.7.0 OpenJDK
>            Reporter: Andy Caldwell
>
> As part of monitoring our Cassandra clusters (generally 2-6 nodes) we were 
> running `nodetool status` regularly (~ every 5 minutes).  On Cassandra 1.2.12 
> this worked fine and had negligible effect on the Cassandra database service.
> Having upgraded to Cassandra 2.0.14, we've found that, over time, the tenured 
> memory space slowly fills with `RMIConnectionImpl` objects (and some other 
> associated objects) until we start running into memory pressure and 
> triggering proactive and then STW GC (which obviously impact performance of 
> the cluster).  It seems that these objects are kept around long enough to get 
> promoted to tenured from Eden and then don't get considered for collection 
> (due to internal reference cycles?).
> Very easy to reproduce, just call `nodetool status` in a loop and watch the 
> memory usage climb to capacity then drop to empty after STW.  No need to be 
> accessing the DB keys at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to