[ 
https://issues.apache.org/jira/browse/HDFS-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747561#comment-13747561
 ] 

Kihwal Lee commented on HDFS-5124:
----------------------------------

We saw a dead lock last night too. I think it's caused by the same issue. It 
happened on a SBN transitioning to active.

{noformat}
"IPC Server handler 16 on 8020":
        at 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.updatePersistedMasterKey(DelegationTokenSecretManager.java:213)
        - waiting to lock <0x000000054fe26a10> (a 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:553)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:198)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:111)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:183)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:179)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1509)
        at 
org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:489)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:470)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:179)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:891)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1455)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1333)
        - locked <0x000000054fe2b110> (a 
org.apache.hadoop.hdfs.server.namenode.NameNode)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1057)
        - locked <0x000000054fe253c0> (a 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer)
        at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
        at 
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1509)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"Socket Reader #2 for port 8020":
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x000000054fe68830> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readLock(FSNamesystem.java:1149)
        at 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.checkAvailableForRead(DelegationTokenSecretManager.java:109)
        at 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.retrievePassword(DelegationTokenSecretManager.java:94)
        - locked <0x000000054fe26a10> (a 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager)
        at 
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.retrievePassword(DelegationTokenSecretManager.java:53)
        at 
org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:271)
        at 
org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:296)
        at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
        at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
        at 
org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1419)
        at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1305)
        at 
org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1291)
        at 
org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1917)
        at 
org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1794)
{noformat}
                
> Namenode in secure cluster deadlocks
> ------------------------------------
>
>                 Key: HDFS-5124
>                 URL: https://issues.apache.org/jira/browse/HDFS-5124
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.1.1-beta
>         Environment: Secure Hadoop 2 cluster
>            Reporter: Deepesh Khandelwal
>            Assignee: Jing Zhao
>            Priority: Blocker
>         Attachments: HADOOP-5124.patch, HDFS-5124.001.patch, 
> HDFS-5124.002.patch, nn_jstack.out
>
>
> Namenode deadlocks after a while in use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to