[
https://issues.apache.org/jira/browse/CASSANDRA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713487#comment-16713487
]
Joseph Lynch commented on CASSANDRA-14922:
------------------------------------------
Alright, I think I'm narrowing in on this. I've managed to get all the threads
to die over in [a
branch|https://github.com/jolynch/cassandra/tree/CASSANDRA-14922] but we're
still leaking all the static state through the {{InstanceClassLoader}} s. I
think I've narrowed it down to just three remaining references (and I _think_
only one of them is a strong reference), details attached in the screenshots.
We basically just need to kill that last strong reference and I believe that
the whole {{InstanceClassLoader}} should become collectible at that point (even
with all the static state and self references to the classloaders should be ok
since it'll be cut off at the root, I think).
> In JVM dtests need to clean up after instance shutdown
> ------------------------------------------------------
>
> Key: CASSANDRA-14922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14922
> Project: Cassandra
> Issue Type: Bug
> Components: Testing
> Reporter: Joseph Lynch
> Assignee: Joseph Lynch
> Priority: Minor
> Attachments: AllThreadsStopped.png, ClassLoadersRetaining.png,
> Leaking_Metrics_On_Shutdown.png, OnlyThreeRootsLeft.png
>
>
> Currently the unit tests are failing on circleci ([example
> one|https://circleci.com/gh/jolynch/cassandra/300#tests/containers/1],
> [example
> two|https://circleci.com/gh/rustyrazorblade/cassandra/44#tests/containers/1])
> because we use a small container (medium) for unit tests by default and the
> in JVM dtests are leaking a few hundred megabytes of memory per test right
> now. This is not a big deal because the dtest runs with the larger containers
> continue to function fine as well as local testing as the number of in JVM
> dtests is not yet high enough to cause a problem with more than 2GB of
> available heap. However we should fix the memory leak so that going forwards
> we can add more in JVM dtests without worry.
> I've been working with [~ifesdjeen] to debug, and the issue appears to be
> unreleased Table/Keyspace metrics (screenshot showing the leak attached). I
> believe that we have a few potential issues that are leading to the leaks:
> 1. The
> [{{Instance::shutdown}}|https://github.com/apache/cassandra/blob/f22fec927de7ac291266660c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/Instance.java#L328-L354]
> method is not successfully cleaning up all the metrics created by the
> {{CassandraMetricsRegistry}}
> 2. The
> [{{TestCluster::close}}|https://github.com/apache/cassandra/blob/f22fec927de7ac291266660c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/TestCluster.java#L283]
> method is not waiting for all the instances to finish shutting down and
> cleaning up before continuing on
> 3. I'm not sure if this is an issue assuming we clear all metrics, but
> [{{TableMetrics::release}}|https://github.com/apache/cassandra/blob/4ae229f5cd270c2b43475b3f752a7b228de260ea/src/java/org/apache/cassandra/metrics/TableMetrics.java#L951]
> does not release all the metric references (which could leak them)
> I am working on a patch which shuts down everything and assures that we do
> not leak memory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]