[
https://issues.apache.org/jira/browse/CASSANDRA-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781793#comment-16781793
]
Benedict commented on CASSANDRA-15006:
--------------------------------------
{quote}Do you have any idea what is the source of these "objects with arbitrary
lifetimes"?
{quote}
Yes, sorry if I wasn't clear. The {{ChunkCache}} (which is like Cassandra
3.x's internal equivalent of the linux page cache, but also for
post-decompression 'pages') uses Cassandra's {{BufferPool}} which is designed
for allocations that are freed in _near to_ the same sequence in which they
were allocated. The {{ChunkCache}} is LRU, however, so its contents can remain
there potentially forever, breaking this assumption.
The {{BufferPool}} allocates in units of 128KiB, meaning it will also only make
available for reuse memory when all 128KiB have been freed. It looks like you
have 64KiB compression chunk size (which is the default for 3.x), meaning this
will typically only require pairs of allocations to be freed together.
However, this is enough to leave many dangling partially used 128KiB units,
where their unused portion is useless for the time being.
It's up to you how you address this - lowering the configuration settings for
these properties, raising your memory limits, or downgrading C*. It should not
be the case that memory would grow unboundedly, only to some fraction above the
normal chunk cache / buffer pool limits. Certainly no more than twice, and I
would anticipate no more than about 30% or so (but my math's is rusty so I
won't try to calculate a guess based on any assumed distribution).
> Possible java.nio.DirectByteBuffer leak
> ---------------------------------------
>
> Key: CASSANDRA-15006
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15006
> Project: Cassandra
> Issue Type: Bug
> Environment: cassandra: 3.11.3
> jre: openjdk version "1.8.0_181"
> heap size: 2GB
> memory limit: 3GB (cgroup)
> I started one of the nodes with "-Djdk.nio.maxCachedBufferSize=262144" but
> that did not seem to make any difference.
> Reporter: Jonas Borgström
> Priority: Major
> Attachments: CASSANDRA-15006-reference-chains.png,
> Screenshot_2019-02-04 Grafana - Cassandra.png, Screenshot_2019-02-14 Grafana
> - Cassandra(1).png, Screenshot_2019-02-14 Grafana - Cassandra.png,
> Screenshot_2019-02-15 Grafana - Cassandra.png, Screenshot_2019-02-22 Grafana
> - Cassandra.png, Screenshot_2019-02-25 Grafana - Cassandra.png,
> cassandra.yaml, cmdline.txt
>
>
> While testing a 3 node 3.11.3 cluster I noticed that the nodes were suddenly
> killed by the Linux OOM killer after running without issues for 4-5 weeks.
> After enabling more metrics and leaving the nodes running for 12 days it sure
> looks like the
> "java.nio:type=BufferPool,name=direct" Mbean shows a very linear growth
> (approx 15MiB/24h, see attached screenshot). Is this expected to keep growing
> linearly after 12 days with a constant load?
>
> In my setup the growth/leak is about 15MiB/day so I guess in most setups it
> would take quite a few days until it becomes noticeable. I'm able to see the
> same type of slow growth in other production clusters even though the graph
> data is more noisy.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]