[
https://issues.apache.org/jira/browse/HDFS-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747561#comment-13747561
]
Kihwal Lee commented on HDFS-5124:
----------------------------------
We saw a dead lock last night too. I think it's caused by the same issue. It
happened on a SBN transitioning to active.
{noformat}
"IPC Server handler 16 on 8020":
at
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.updatePersistedMasterKey(DelegationTokenSecretManager.java:213)
- waiting to lock <0x000000054fe26a10> (a
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:553)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:198)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:111)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:183)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$1.run(EditLogTailer.java:179)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1509)
at
org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:489)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:470)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.catchupDuringFailover(EditLogTailer.java:179)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:891)
at
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1455)
at
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1333)
- locked <0x000000054fe2b110> (a
org.apache.hadoop.hdfs.server.namenode.NameNode)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1057)
- locked <0x000000054fe253c0> (a
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer)
at
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1509)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
"Socket Reader #2 for port 8020":
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000054fe68830> (a
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readLock(FSNamesystem.java:1149)
at
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.checkAvailableForRead(DelegationTokenSecretManager.java:109)
at
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.retrievePassword(DelegationTokenSecretManager.java:94)
- locked <0x000000054fe26a10> (a
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager)
at
org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenSecretManager.retrievePassword(DelegationTokenSecretManager.java:53)
at
org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:271)
at
org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:296)
at
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:585)
at
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
at
org.apache.hadoop.ipc.Server$Connection.processSaslMessage(Server.java:1419)
at org.apache.hadoop.ipc.Server$Connection.saslProcess(Server.java:1305)
at
org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1291)
at
org.apache.hadoop.ipc.Server$Connection.processRpcOutOfBandRequest(Server.java:1917)
at
org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1794)
{noformat}
> Namenode in secure cluster deadlocks
> ------------------------------------
>
> Key: HDFS-5124
> URL: https://issues.apache.org/jira/browse/HDFS-5124
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.1.1-beta
> Environment: Secure Hadoop 2 cluster
> Reporter: Deepesh Khandelwal
> Assignee: Jing Zhao
> Priority: Blocker
> Attachments: HADOOP-5124.patch, HDFS-5124.001.patch,
> HDFS-5124.002.patch, nn_jstack.out
>
>
> Namenode deadlocks after a while in use.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira