[
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150172#comment-16150172
]
Markus Dlugi commented on CASSANDRA-13754:
------------------------------------------
[~snazy], I don't think the node is overloaded. I originally thought so as
well, so I made a little experiment where I included a cap in our load test
limiting the {{INSERT}} s per minute from ~25,000 to ~10,000. As a consequence,
the node survived a little longer, but in the end it still died with an
{{OutOfMemoryError}} after more data had been inserted. So it's not that there
are too many active writes, it's just that the node fails after a certain
amount of total writes, which indicates to me that a memory leak is indeed
happening.
I also had another look into the heap dump I sent you, and you are correct that
the heap is mostly filled with {{BTree$Builder}} instances that still have
stuff in their {{values}} array. However, if you look closer, you will notice
that for each of these instances, the {{values}} array always contains {{null}}
for the first couple of entries, and only after those there is still actual
content. For some reason, the actual content always starts at index 28, whereas
indices 0 - 27 are {{null}} - not sure if this is a coincidence? But you can
also see that for all the {{BTree$Builder}} objects, the {{count}} attribute is
0, which also indicates to me that {{BTree$Builder.cleanup()}} has already run
and those are not active writes. This theory is supported by the fact that my
little workaround of manually calling {{FastThreadLocal.removeAll()}} actually
works, because this means that no other objects except the {{FastThreadLocal}}
s still have references to the builders.
Therefore, I think we have two issues here:
# {{SEPWorker}} is never cleaning the {{FastThreadLocal}} s, therefore
accumulating references to otherwise dead objects - maybe we can include
something to at least remove non-static entries regularly?
# {{BTree$Builder}} seems to have an issue properly cleaning up after building,
so the objects referenced by the {{FastThreadLocal}} s of the {{SEPWorker}}
threads are very large and thus ultimately lead to the {{OutOfMemoryError}} s
> FastThreadLocal leaks memory
> ----------------------------
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
> Reporter: Eric Evans
> Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment,
> a heap analysis is showing that more than 10G of our 12G heaps are consumed
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}})
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.
> Reverting
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
> fixes the issue.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]