[ 
https://issues.apache.org/jira/browse/CASSANDRA-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768279#comment-16768279
 ] 

Jonas Borgström commented on CASSANDRA-15006:
---------------------------------------------

I've just uploaded an updated Grafana screenshot that shows that the direct 
(off-heap) allocations are still increasing linearly after 22 days.

The references screenshot is from a tool called JXRay and I believe the top 
5.8GiB entry is for something called java.nio.DirectByteBufferR and not 
java.nio.DirectByteBuffer. I'm no java developer but I believe that represents 
mmapped-memory and not off-heap allocated memory. This matches well with the 
top right graph (java.nio:type=BufferPool,name=mapped)

Since I'm continuously both adding new data and accessing all existing data I 
guess it is expected that the amount of mapped memory increases to match the 
size of the ever increasing SSTables on disk. But I guess that is fine since 
the Linux OOM killer should not kill the java process for using too much memory 
mapped memory.

But the OOM killer will kill the java process if too much off-heap memory is 
used. Perhaps Cassandra for some reason needs to allocate a bit off 
direct/off-heap memory for every chunk of memory mapped region it accesses?

 

Next I'll try to truncate the largest table to see what kind of effect that 
will have on the java.nio.DirectByteBuffer usage.

> Possible java.nio.DirectByteBuffer leak
> ---------------------------------------
>
>                 Key: CASSANDRA-15006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15006
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: cassandra: 3.11.3
> jre: openjdk version "1.8.0_181"
> heap size: 2GB
> memory limit: 3GB (cgroup)
> I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but 
> that did not seem to make any difference.
>            Reporter: Jonas Borgström
>            Priority: Major
>         Attachments: CASSANDRA-15006-reference-chains.png, 
> Screenshot_2019-02-04 Grafana - Cassandra.png, Screenshot_2019-02-14 Grafana 
> - Cassandra.png
>
>
> While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly 
> killed by the Linux OOM killer after running without issues for 4-5 weeks.
> After enabling more metrics and leaving the nodes running for 12 days it sure 
> looks like the
> "java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth 
> (approx 15MiB/24h, see attached screenshot). Is this expected to keep growing 
> linearly after 12 days with a constant load?
>  
> In my setup the growth/leak is about 15MiB/day so I guess in most setups it 
> would take quite a few days until it becomes noticeable. I'm able to see the 
> same type of slow growth in other production clusters even though the graph 
> data is more noisy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to