[
https://issues.apache.org/jira/browse/CASSANDRA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602771#comment-14602771
]
Benedict commented on CASSANDRA-9423:
-------------------------------------
So, it looks like we were creating strong circular reference leaks already in
2.1, severely damaging the utility of the leak detection. I've pushed a patch
[here|https://github.com/belliottsmith/cassandra/tree/9423] that:
# fixes these circular references;
# introduces two kinds of leak detection:
## Detects circular references directly, by periodically walking the object
graph of the ref state objects; and
## Detects potential strong leak candidates obliquely, by constructing the
total set of expected ref objects, and comparing them to those that are
actually extant; if an unexpected object remains extant across two such runs
(at fifteen minute intervals) it is reported as a leak *candidate*.
This last check is *not* perfect, as we could construct objects that we haven't
yet made visible in the tracker, for instance, but generally they should not
remain invisible for fifteen minutes. We can follow up with some improvements
to further guarantee this, but it should be good enough for now.
Both are only run when {{-Dcassandra.debugrefcount=true}}, so this will not in
any way affect production systems.
I've tagged as 2.1, as the circular reference leaks affect it, and besides that
the only changes are a no-op for production systems.
It's worth recording that anonymous classes are *never* static, even if they
require no handle to their enclosing class, and this was the source of a
majority of circular references. But there were others also that I had not
expected after fixing this, that were also detected by this debugging.
> Improve Leak Detection to cover strong reference leaks
> ------------------------------------------------------
>
> Key: CASSANDRA-9423
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9423
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Priority: Critical
> Fix For: 2.1.8
>
>
> Currently we detect resources that we don't cleanup that become unreachable.
> We could also detect references that appear to have leaked without becoming
> unreachable, by periodically scanning the set of extant refs, and checking if
> they are reachable via their normal means (if any); if their lifetime is
> unexpectedly long this likely indicates a problem, and we can log a
> warning/error.
> Assigning to myself to not forget it, since this may well help especially
> with [~tjake]'s concerns highlighted on 8099 for 3.0.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)