[ 
https://issues.apache.org/jira/browse/CASSANDRA-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775223#comment-16775223
 ] 

Jonas Borgström commented on CASSANDRA-15006:
---------------------------------------------

My test cluster has now been running for one more week since my last report 
with the same CQL read/write load and one full repair every hour.

There we can see the "java.nio:type=BufferPool,name=direct/MemoryUsed" graph 
still shows a very linear growth (10-15MB/day).

Last Saturday there were a short outage causing one of the nodes to restart. 
This seems to have caused that node to report a values that is consistently 
about 300MB lower than the other nodes but besides that it is now increasing at 
the same rate as the other two nodes.

The next thing I'm going to try is to change the "nodetool repair --full" 
frequency from once per hour to once every 5 days to see if that has any affect 
on the rate of the leak.

 

Btw, do we know anyone else how are monitoring the 
"java.nio:type=BufferPool,name=direct" JMX Mbean and are able to graph it? It 
would be very interesting to see how it looks for other clusters.

 

> Possible java.nio.DirectByteBuffer leak
> ---------------------------------------
>
>                 Key: CASSANDRA-15006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15006
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: cassandra: 3.11.3
> jre: openjdk version "1.8.0_181"
> heap size: 2GB
> memory limit: 3GB (cgroup)
> I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but 
> that did not seem to make any difference.
>            Reporter: Jonas Borgström
>            Priority: Major
>         Attachments: CASSANDRA-15006-reference-chains.png, 
> Screenshot_2019-02-04 Grafana - Cassandra.png, Screenshot_2019-02-14 Grafana 
> - Cassandra(1).png, Screenshot_2019-02-14 Grafana - Cassandra.png, 
> Screenshot_2019-02-15 Grafana - Cassandra.png, Screenshot_2019-02-22 Grafana 
> - Cassandra.png, cassandra.yaml, cmdline.txt
>
>
> While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly 
> killed by the Linux OOM killer after running without issues for 4-5 weeks.
> After enabling more metrics and leaving the nodes running for 12 days it sure 
> looks like the
> "java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth 
> (approx 15MiB/24h, see attached screenshot). Is this expected to keep growing 
> linearly after 12 days with a constant load?
>  
> In my setup the growth/leak is about 15MiB/day so I guess in most setups it 
> would take quite a few days until it becomes noticeable. I'm able to see the 
> same type of slow growth in other production clusters even though the graph 
> data is more noisy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to