[
https://issues.apache.org/jira/browse/HADOOP-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677254#action_12677254
]
dhruba borthakur commented on HADOOP-4655:
------------------------------------------
I would like to resurrect the discussion on this issue.
I am in the process of integrating a distributed log aggregation service called
Scribe (http://developers.facebook.com/scribe/) to be able to store directly
into HDFS. The scribe server invokes HDFS APIs using libhdfs. Scribe is a
multi-threaded software module, each thread should be able to independently
open files from different hdfs clusters. It will be very convenient for scribe
to be able to invoke hdfsConnect() just before every time a file is opened.
Similarly, after closing a file, it would invoke hdfsdisConnect(). This reduces
complexity in the scribe-hdfs integration to a large extent. However, with the
current caching behaviour of FileSystem.get(), the above is not possible.
Scribe has to maintain reference counts for each open and understand the
caching behaviour of FileSystem objects.
I would like to implement the second option that Doug has suggested, i.e.
"remove the cache, and allocate a new FileSystem instance per call to
getFileSystem()". However, this will be an incompatible change, especially
because pre-existing applications might be rely on this singular behavour to
elegantly open file systems. To ensure than existing implementations do not
break, we can introduce a new method FileSystem.getNewFileSystem() that skips
the CACHE and allocates a new FileSystem object every time it is invoked.
> FileSystem.CACHE should be ref-counted
> --------------------------------------
>
> Key: HADOOP-4655
> URL: https://issues.apache.org/jira/browse/HADOOP-4655
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs, fs
> Affects Versions: 0.18.1, 0.18.2, 0.19.0, 0.19.1, 0.20.0
> Reporter: Hong Tang
> Assignee: dhruba borthakur
>
> FileSystem.CACHE is not ref-counted, and could lead to resource leakage.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.