[jira] [Commented] (CASSANDRA-15006) Possible java.nio.DirectByteBuffer leak

Benedict (JIRA) Fri, 08 Feb 2019 02:12:12 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763463#comment-16763463
 ]


Benedict commented on CASSANDRA-15006:
--------------------------------------

So, in CASSANDRA-11993 the BufferPool began being abused for jobs it wasn't 
intended for.  Its main user is now the chunk cache, so the lifetime of its 
buffers is considerably longer than was intended, and without any necessary 
correlation to the invocation of {{free()}} for allocations from a given chunk. 
 This is almost certainly a bug, as BufferPool chunks may be mostly unused but 
remain allocated while a single ChunkCache chunk needs it.  

But it's unclear from the data you've posted if this has anything to do with 
your significant memory usage.

I'm not used to the tooling you've posted images from, but it looks like 
there's 5.8GiB of buffers in total, and only around 500MiB of buffers who are 
reachable from the chunk cache or the buffer pool?  It looks like we want to 
figure out what (presumably global) variable that {{ByteBuffer[]}} corresponds 
to first.

> Possible java.nio.DirectByteBuffer leak
> ---------------------------------------
>
>                 Key: CASSANDRA-15006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15006
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: cassandra: 3.11.3
> jre: openjdk version "1.8.0_181"
> heap size: 2GB
> memory limit: 3GB (cgroup)
> I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but 
> that did not seem to make any difference.
>            Reporter: Jonas Borgström
>            Priority: Major
>         Attachments: CASSANDRA-15006-reference-chains.png, 
> Screenshot_2019-02-04 Grafana - Cassandra.png
>
>
> While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly 
> killed by the Linux OOM killer after running without issues for 4-5 weeks.
> After enabling more metrics and leaving the nodes running for 12 days it sure 
> looks like the
> "java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth 
> (approx 15MiB/24h, see attached screenshot). Is this expected to keep growing 
> linearly after 12 days with a constant load?
>  
> In my setup the growth/leak is about 15MiB/day so I guess in most setups it 
> would take quite a few days until it becomes noticeable. I'm able to see the 
> same type of slow growth in other production clusters even though the graph 
> data is more noisy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15006) Possible java.nio.DirectByteBuffer leak

Reply via email to