[ 
https://issues.apache.org/jira/browse/HDFS-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747031#comment-13747031
 ] 

Jing Zhao commented on HDFS-5124:
---------------------------------

Analysis from [~cnauroth]:

"From a very quick scan, it looks to me like it's related to HADOOP-9880.  With 
this patch, we now have a lock ordering conflict around the namesystem lock and 
synchronized methods on the DelegationTokenSecretManager.  Example:

RPC handler thread 1 is running a cancelDelegationToken:
1. Acquire FSNamesystem write lock in FSNamesystem.cancelDelegationToken.
2. Call DelegationTokenSecretManager.cancelToken, which is synchronized.

RPC handler thread 2 is negotiating SASL for a message:
1. Call DelegationTokenSecretManager.retrievePassword, which is synchronized.
2. Acquire FSNamesystem read lock in 
DelegationTokenSecretManager.retrievePassword.

(Same instance of FSNamesystem lock and DelegationTokenSecretManager accessed 
in both threads, with different locking orders.)"
                
> Namenode in secure cluster deadlocks
> ------------------------------------
>
>                 Key: HDFS-5124
>                 URL: https://issues.apache.org/jira/browse/HDFS-5124
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.1-beta
>         Environment: Secure Hadoop 2 cluster
>            Reporter: Deepesh Khandelwal
>            Assignee: Jing Zhao
>            Priority: Blocker
>         Attachments: nn_jstack.out
>
>
> Namenode deadlocks after a while in use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to