[ https://issues.apache.org/jira/browse/CASSANDRA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742047#comment-16742047 ]
Benedict commented on CASSANDRA-14922: -------------------------------------- Thanks for the review. I'll push an update shortly, with the test failures handled, and a slight tweak to not misleadingly return a SerializableX when it is wrapped with an executor (since this cannot be serialized). bq. we can remove thread-local works in this case since we're passing the right class loader, making references unreachable. Is that right? By 'passing' do you mean to the thread factory? In which case, no, that shouldn't have an impact (I pass it only because it seems to make sense, it should work still without doing so, and seems to if I try). All we're really doing is ensuring that any thread that evaluates anything inside one of the classes loaded by the instance's classloader is shutdown when the node is shutdown, by passing the work to a thread on this executor. > In JVM dtests need to clean up after instance shutdown > ------------------------------------------------------ > > Key: CASSANDRA-14922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14922 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest > Reporter: Joseph Lynch > Assignee: Joseph Lynch > Priority: Minor > Fix For: 4.0 > > Attachments: AllThreadsStopped.png, ClassLoadersRetaining.png, > Leaking_Metrics_On_Shutdown.png, MainClassRetaining.png, > MemoryReclaimedFix.png, Metaspace_Actually_Collected.png, > OnlyThreeRootsLeft.png, no_more_references.png > > > Currently the unit tests are failing on circleci ([example > one|https://circleci.com/gh/jolynch/cassandra/300#tests/containers/1], > [example > two|https://circleci.com/gh/rustyrazorblade/cassandra/44#tests/containers/1]) > because we use a small container (medium) for unit tests by default and the > in JVM dtests are leaking a few hundred megabytes of memory per test right > now. This is not a big deal because the dtest runs with the larger containers > continue to function fine as well as local testing as the number of in JVM > dtests is not yet high enough to cause a problem with more than 2GB of > available heap. However we should fix the memory leak so that going forwards > we can add more in JVM dtests without worry. > I've been working with [~ifesdjeen] to debug, and the issue appears to be > unreleased Table/Keyspace metrics (screenshot showing the leak attached). I > believe that we have a few potential issues that are leading to the leaks: > 1. The > [{{Instance::shutdown}}|https://github.com/apache/cassandra/blob/f22fec927de7ac291266660c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/Instance.java#L328-L354] > method is not successfully cleaning up all the metrics created by the > {{CassandraMetricsRegistry}} > 2. The > [{{TestCluster::close}}|https://github.com/apache/cassandra/blob/f22fec927de7ac291266660c2f34de5b8cc1c695/test/distributed/org/apache/cassandra/distributed/TestCluster.java#L283] > method is not waiting for all the instances to finish shutting down and > cleaning up before continuing on > 3. I'm not sure if this is an issue assuming we clear all metrics, but > [{{TableMetrics::release}}|https://github.com/apache/cassandra/blob/4ae229f5cd270c2b43475b3f752a7b228de260ea/src/java/org/apache/cassandra/metrics/TableMetrics.java#L951] > does not release all the metric references (which could leak them) > I am working on a patch which shuts down everything and assures that we do > not leak memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org