[
https://issues.apache.org/jira/browse/HADOOP-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847478#action_12847478
]
Tsz Wo (Nicholas), SZE commented on HADOOP-6640:
------------------------------------------------
When FileSystem cache is enabled, FileSystem.get(..) will call
FileSystem.Cache.get(..), which is a synchronized method. If the lookup fails,
a new instance will be initialized. Depends on the FileSystem subclass
implementation, the initialization may take a long time. In such case, the
FileSystem.Cache lock will be hold and all calls to FileSystem.get(..) by other
threads will be blocked for a long time.
In particular, the DistributedFileSystem initialization may take a long time
since there are retries. It is even worst if the socket timeout is set to a
large value.
There are two possible fixes for the problem:
# (by Sanjay) Change FileSystem.Cache.get(..) so that if the lookup fails, it
first releases the lock, initializes a FileSystem instance, acquires the lock
again, and then add the instance to the cache. One problem is that if a user
application keeps calling FileSystem.get(..) for the same FileSystem in a short
period of time, it will result in initializing many instances.
# Change DistributedFileSystem so that it does a lazy connection: it defers
connecting to the server until there is an rpc. A drawback is that this only
fixes DistributedFileSystem but not other FileSystem subclasses.
> FileSystem.get() does RPC retries within a static synchronized block
> --------------------------------------------------------------------
>
> Key: HADOOP-6640
> URL: https://issues.apache.org/jira/browse/HADOOP-6640
> Project: Hadoop Common
> Issue Type: Bug
> Environment: all
> Reporter: Alejandro Abdelnur
> Priority: Critical
>
> If using FileSystem.get() in a multithreaded environment, and one get() locks
> because the NN URI is too slow or not responding and retries are in progress,
> all other get() (for the diffferent users, NN) are blocked.
> the synchronized block in in the static instance of Cache inner class.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.