[ 
https://issues.apache.org/jira/browse/CASSANDRA-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781793#comment-16781793
 ] 

Benedict commented on CASSANDRA-15006:
--------------------------------------

{quote}Do you have any idea what is the source of these "objects with arbitrary 
lifetimes"?
{quote}
Yes, sorry if I wasn't clear.  The {{ChunkCache}} (which is like Cassandra 
3.x's internal equivalent of the linux page cache, but also for 
post-decompression 'pages') uses Cassandra's {{BufferPool}} which is designed 
for allocations that are freed in _near to_ the same sequence in which they 
were allocated.  The {{ChunkCache}} is LRU, however, so its contents can remain 
there potentially forever, breaking this assumption.

The {{BufferPool}} allocates in units of 128KiB, meaning it will also only make 
available for reuse memory when all 128KiB have been freed.  It looks like you 
have 64KiB compression chunk size (which is the default for 3.x), meaning this 
will typically only require pairs of allocations to be freed together.  
However, this is enough to leave many dangling partially used 128KiB units, 
where their unused portion is useless for the time being.

It's up to you how you address this - lowering the configuration settings for 
these properties, raising your memory limits, or downgrading C*.  It should not 
be the case that memory would grow unboundedly, only to some fraction above the 
normal chunk cache / buffer pool limits.  Certainly no more than twice, and I 
would anticipate no more than about 30% or so (but my math's is rusty so I 
won't try to calculate a guess based on any assumed distribution).

> Possible java.nio.DirectByteBuffer leak
> ---------------------------------------
>
>                 Key: CASSANDRA-15006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15006
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: cassandra: 3.11.3
> jre: openjdk version "1.8.0_181"
> heap size: 2GB
> memory limit: 3GB (cgroup)
> I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but 
> that did not seem to make any difference.
>            Reporter: Jonas Borgström
>            Priority: Major
>         Attachments: CASSANDRA-15006-reference-chains.png, 
> Screenshot_2019-02-04 Grafana - Cassandra.png, Screenshot_2019-02-14 Grafana 
> - Cassandra(1).png, Screenshot_2019-02-14 Grafana - Cassandra.png, 
> Screenshot_2019-02-15 Grafana - Cassandra.png, Screenshot_2019-02-22 Grafana 
> - Cassandra.png, Screenshot_2019-02-25 Grafana - Cassandra.png, 
> cassandra.yaml, cmdline.txt
>
>
> While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly 
> killed by the Linux OOM killer after running without issues for 4-5 weeks.
> After enabling more metrics and leaving the nodes running for 12 days it sure 
> looks like the
> "java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth 
> (approx 15MiB/24h, see attached screenshot). Is this expected to keep growing 
> linearly after 12 days with a constant load?
>  
> In my setup the growth/leak is about 15MiB/day so I guess in most setups it 
> would take quite a few days until it becomes noticeable. I'm able to see the 
> same type of slow growth in other production clusters even though the graph 
> data is more noisy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to