[ 
https://issues.apache.org/jira/browse/HADOOP-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677254#action_12677254
 ] 

dhruba borthakur commented on HADOOP-4655:
------------------------------------------

I would like to resurrect the discussion on this issue.

I am in the process of integrating a distributed log aggregation service called 
Scribe (http://developers.facebook.com/scribe/) to be able to store directly 
into HDFS. The scribe server invokes HDFS APIs using libhdfs. Scribe is a 
multi-threaded software module, each thread should be able to independently 
open files from different hdfs clusters. It will be very convenient for scribe 
to be able to invoke hdfsConnect() just before every time a file is opened. 
Similarly, after closing a file, it would invoke hdfsdisConnect(). This reduces 
complexity in the scribe-hdfs integration to a large extent. However, with the 
current caching behaviour of FileSystem.get(), the above is not possible. 
Scribe has to maintain reference counts for each open and understand the 
caching behaviour of FileSystem objects.

I would like to implement the second option that Doug has suggested, i.e. 
"remove the cache, and allocate a new FileSystem instance per call to 
getFileSystem()". However, this will be an incompatible change, especially 
because pre-existing applications might be rely on this singular behavour to 
elegantly open file systems. To ensure than existing implementations do not 
break, we can introduce a new method FileSystem.getNewFileSystem() that skips 
the CACHE and allocates a new FileSystem object every time it is invoked.  



> FileSystem.CACHE should be ref-counted
> --------------------------------------
>
>                 Key: HADOOP-4655
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4655
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, fs
>    Affects Versions: 0.18.1, 0.18.2, 0.19.0, 0.19.1, 0.20.0
>            Reporter: Hong Tang
>            Assignee: dhruba borthakur
>
> FileSystem.CACHE is not ref-counted, and could lead to resource leakage.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to