[
https://issues.apache.org/jira/browse/HADOOP-12707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098508#comment-15098508
]
Chris Nauroth commented on HADOOP-12707:
----------------------------------------
bq. ...we could consider using weak references, or at least having some idleness
I thought about weak references at one point but abandoned it for a few reasons:
# Weak references would provide different semantics compared to the current
cache implementation. Right now, applications can get a {{FileSystem}}, use
it, abandon it for a long time (no longer holding a reference to it), and then
come back and still find that instance sitting in the cache. With weak
references, there would be a race condition if a GC swept away the instance
before the application came back to claim it again. This could change
performance and load patterns in negative ways if clients need to reconnect
sockets. Whether or not it would really be problematic in practice is unclear,
but there is a ton of ecosystem and application code that would need testing.
Any such testing would be difficult and perhaps unconvincing, because it would
be dependent on external factors like heap configuration, and ultimately, the
timing of GC is non-deterministic anyway.
# The cache goes beyond just resource management and is actually coupled to
some logic that implements the API contract. Delete-on-exit becomes
problematic. With weak references, I suppose we'd have to trigger the deletes
from {{finalize}}. I've experienced a lot of bugs in other codebases that rely
on a finalizer to do significant work, so I have an immediate aversion to this.
I also worry about tricky interactions with the {{ClientFinalizer}} shutdown
hook. I guess an alternative could be to somehow "resurrect" a reaped instance
that still has pending delete-on-exit work, but I expect this would be complex.
# Even assuming we do a correct bug-free implementation of delete-on-exit from
a finalizer, it still changes the semantics. The deletions are triggered from
the {{close}} method. Many applications don't bother calling {{close}} at all.
In that case, the deletes would happen during the {{ClientFinalizer}} shutdown
hook. Effectively, these applications expect that delete-on-exit means
delete-on-process-exit. They might even be calling {{cancelDeleteOnExit}} to
cancel a prior queued delete. If a weak-referenced instance drops out because
of GC, then the deletes would happen at an unexpected time, all of that prior
state would be lost, and the application has lost its opportunity to call
{{cancelDeleteOnExit}}.
An idleness policy would have similar challenges. It's hard to identify
idleness correctly within the {{FileSystem}} layer, because current
applications expect they can come back to the cache at any arbitrary time and
get the same instance again.
Effectively, this cache isn't just a cache. It's really a set of global
variables where clients expect that they can do stateful interactions.
Pitfalls of global variables, blah blah blah... :-)
I don't like the current implementation, but I also don't see a safe way
forward that is backwards-compatible. If I had my preference, we'd
re-implement it with explicit reference counting, require by contract that all
callers call {{close}} when finished, and perhaps try to scrap the
delete-on-exit feature entirely. There has been pushback on the
reference-counting proposal in the past, but I don't remember the exact JIRA.
Maybe explore backwards-incompatible changes for 3.x?
> key of FileSystem inner class Cache contains UGI.hascode which uses the
> defualt hascode method, leading to the memory leak
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-12707
> URL: https://issues.apache.org/jira/browse/HADOOP-12707
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.7.1
> Reporter: sunhaitao
> Assignee: sunhaitao
>
> FileSystem.get(conf) method,By default it will get the fs object from
> CACHE,But the key of the CACHE constains ugi.hashCode, which uses the
> default hascode method of subject instead of the hascode method overwritten
> by subject.
> @Override
> public int hashCode() {
> return (scheme + authority).hashCode() + ugi.hashCode() + (int)unique;
> }
> In this case, even if same user, if the calll FileSystem.get(conf) twice, two
> different key will be created. In long duartion, this will lead to memory
> leak.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)