[
https://issues.apache.org/jira/browse/CASSANDRA-9805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999895#comment-14999895
]
Andy Caldwell commented on CASSANDRA-9805:
------------------------------------------
Further investigation (see the discussion in
https://github.com/Metaswitch/clearwater-cassandra/issues/42) suggests you are
correct, and this is a CMS collection. Unfortunately, CMS is not very
concurrent on a single-core machine (since the GC thread takes over a core
during CMS and there's no other core to handle service load until the GC
completes) so the GC is basically an outage. As I previously mentioned, in
1.2.12 we didn't see the steadily climbing memory usage, we only see it in 2.0
or 2.1 installs. This is making monitoring our Cassandra deployment more
difficult as we can't afford to use nodetool.
According to https://wiki.apache.org/cassandra/CassandraHardware, you recommend
running Cassandra on large machines (8-cores is your "sweet spot" for hardware)
which would side step this problem mostly but you also recommend running on
large EC2 instances which have only 2 cores
(https://aws.amazon.com/ec2/instance-types/) and so will halve in capacity
during a CMS. Do you have any advice on running Cassandra on machines with
limited cores such that we can avoid/mitigate the costs of the garbage
collector?
> nodetool status causes garbage to be accrued
> --------------------------------------------
>
> Key: CASSANDRA-9805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9805
> Project: Cassandra
> Issue Type: Bug
> Components: Tools
> Environment: Ubuntu 14.04 64-bit
> Cassandra 2.0.14
> Java 1.7.0 OpenJDK
> Reporter: Andy Caldwell
>
> As part of monitoring our Cassandra clusters (generally 2-6 nodes) we were
> running `nodetool status` regularly (~ every 5 minutes). On Cassandra 1.2.12
> this worked fine and had negligible effect on the Cassandra database service.
> Having upgraded to Cassandra 2.0.14, we've found that, over time, the tenured
> memory space slowly fills with `RMIConnectionImpl` objects (and some other
> associated objects) until we start running into memory pressure and
> triggering proactive and then STW GC (which obviously impact performance of
> the cluster). It seems that these objects are kept around long enough to get
> promoted to tenured from Eden and then don't get considered for collection
> (due to internal reference cycles?).
> Very easy to reproduce, just call `nodetool status` in a loop and watch the
> memory usage climb to capacity then drop to empty after STW. No need to be
> accessing the DB keys at all.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)