[
https://issues.apache.org/jira/browse/HADOOP-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183617#comment-17183617
]
Haibo Chen commented on HADOOP-17214:
-------------------------------------
The caching inside FileSystem is based on a vanilla HashMap, where the Key is
partially based on UGI. Whenever UGI.loginwithKeytab() is called, the underly
UGI object changes, the previously key-value pair is left unused in the cache
and new entries are continuously added to the cache. Essentially we have a
memory leak situation in the cache.
I don't think this subtle behavior is documented anywhere, and we have seen
many FileSystem users follow this pattern where UGI.loginWIthKeytab() maybe
called concurrently from multiple threads. Overtime, this leads to JVM heap
being filled with leaked instances in the File System cache
For most of our internal FileSystem implementations (open source ones too), it
is often the case that caching is left enabled (which is the default) and we
would end up discovering this memory leak only in production.
Having a global flag would allow us to avoid such issues in our use cases.
> Allow file system caching to be disabled for all file systems
> -------------------------------------------------------------
>
> Key: HADOOP-17214
> URL: https://issues.apache.org/jira/browse/HADOOP-17214
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Affects Versions: 3.3.0
> Reporter: Haibo Chen
> Assignee: Haibo Chen
> Priority: Major
>
> Right now, FileSystem.get(URI uri, Configuration conf) allows caching of file
> systems to be disabled per scheme.
> We can introduce a new global conf to disable caching for all FileSystem, the
> default would be false (or do not disable cache gobally).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]