[ 
https://issues.apache.org/jira/browse/HADOOP-12707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098508#comment-15098508
 ] 

Chris Nauroth commented on HADOOP-12707:
----------------------------------------

bq. ...we could consider using weak references, or at least having some idleness

I thought about weak references at one point but abandoned it for a few reasons:

# Weak references would provide different semantics compared to the current 
cache implementation.  Right now, applications can get a {{FileSystem}}, use 
it, abandon it for a long time (no longer holding a reference to it), and then 
come back and still find that instance sitting in the cache.  With weak 
references, there would be a race condition if a GC swept away the instance 
before the application came back to claim it again.  This could change 
performance and load patterns in negative ways if clients need to reconnect 
sockets.  Whether or not it would really be problematic in practice is unclear, 
but there is a ton of ecosystem and application code that would need testing.  
Any such testing would be difficult and perhaps unconvincing, because it would 
be dependent on external factors like heap configuration, and ultimately, the 
timing of GC is non-deterministic anyway.
# The cache goes beyond just resource management and is actually coupled to 
some logic that implements the API contract.  Delete-on-exit becomes 
problematic.  With weak references, I suppose we'd have to trigger the deletes 
from {{finalize}}.  I've experienced a lot of bugs in other codebases that rely 
on a finalizer to do significant work, so I have an immediate aversion to this. 
 I also worry about tricky interactions with the {{ClientFinalizer}} shutdown 
hook.  I guess an alternative could be to somehow "resurrect" a reaped instance 
that still has pending delete-on-exit work, but I expect this would be complex.
# Even assuming we do a correct bug-free implementation of delete-on-exit from 
a finalizer, it still changes the semantics.  The deletions are triggered from 
the {{close}} method.  Many applications don't bother calling {{close}} at all. 
 In that case, the deletes would happen during the {{ClientFinalizer}} shutdown 
hook.  Effectively, these applications expect that delete-on-exit means 
delete-on-process-exit.  They might even be calling {{cancelDeleteOnExit}} to 
cancel a prior queued delete.  If a weak-referenced instance drops out because 
of GC, then the deletes would happen at an unexpected time, all of that prior 
state would be lost, and the application has lost its opportunity to call 
{{cancelDeleteOnExit}}.

An idleness policy would have similar challenges.  It's hard to identify 
idleness correctly within the {{FileSystem}} layer, because current 
applications expect they can come back to the cache at any arbitrary time and 
get the same instance again.

Effectively, this cache isn't just a cache.  It's really a set of global 
variables where clients expect that they can do stateful interactions.  
Pitfalls of global variables, blah blah blah...  :-)

I don't like the current implementation, but I also don't see a safe way 
forward that is backwards-compatible.  If I had my preference, we'd 
re-implement it with explicit reference counting, require by contract that all 
callers call {{close}} when finished, and perhaps try to scrap the 
delete-on-exit feature entirely.  There has been pushback on the 
reference-counting proposal in the past, but I don't remember the exact JIRA.

Maybe explore backwards-incompatible changes for 3.x?

> key of FileSystem inner class Cache contains UGI.hascode which uses the 
> defualt hascode method, leading to the memory leak
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-12707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12707
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.1
>            Reporter: sunhaitao
>            Assignee: sunhaitao
>
> FileSystem.get(conf) method,By default it will get the fs object from 
> CACHE,But the key of the CACHE  constains ugi.hashCode, which uses the 
> default hascode method of subject instead of the hascode method overwritten 
> by subject.
>    @Override
>       public int hashCode() {
>         return (scheme + authority).hashCode() + ugi.hashCode() + (int)unique;
>       }
> In this case, even if same user, if the calll FileSystem.get(conf) twice, two 
> different key will be created. In long duartion, this will lead to memory 
> leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to