[ 
https://issues.apache.org/jira/browse/CASSANDRA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602771#comment-14602771
 ] 

Benedict commented on CASSANDRA-9423:
-------------------------------------

So, it looks like we were creating strong circular reference leaks already in 
2.1, severely damaging the utility of the leak detection. I've pushed a patch 
[here|https://github.com/belliottsmith/cassandra/tree/9423] that:

# fixes these circular references;
# introduces two kinds of leak detection:
## Detects circular references directly, by periodically walking the object 
graph of the ref state objects; and
## Detects potential strong leak candidates obliquely, by constructing the 
total set of expected ref objects, and comparing them to those that are 
actually extant; if an unexpected object remains extant across two such runs 
(at fifteen minute intervals) it is reported as a leak *candidate*.

This last check is *not* perfect, as we could construct objects that we haven't 
yet made visible in the tracker, for instance, but generally they should not 
remain invisible for fifteen minutes. We can follow up with some improvements 
to further guarantee this, but it should be good enough for now.

Both are only run when {{-Dcassandra.debugrefcount=true}}, so this will not in 
any way affect production systems.

I've tagged as 2.1, as the circular reference leaks affect it, and besides that 
the only changes are a no-op for production systems.

It's worth recording that anonymous classes are *never* static, even if they 
require no handle to their enclosing class, and this was the source of a 
majority of circular references. But there were others also that I had not 
expected after fixing this, that were also detected by this debugging.

> Improve Leak Detection to cover strong reference leaks
> ------------------------------------------------------
>
>                 Key: CASSANDRA-9423
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9423
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Priority: Critical
>             Fix For: 2.1.8
>
>
> Currently we detect resources that we don't cleanup that become unreachable. 
> We could also detect references that appear to have leaked without becoming 
> unreachable, by periodically scanning the set of extant refs, and checking if 
> they are reachable via their normal means (if any); if their lifetime is 
> unexpectedly long this likely indicates a problem, and we can log a 
> warning/error.
> Assigning to myself to not forget it, since this may well help especially 
> with [~tjake]'s concerns highlighted on 8099 for 3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to