[
https://issues.apache.org/jira/browse/HDFS-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440501#comment-13440501
]
Brandon Li commented on HDFS-3846:
----------------------------------
One deadlock example is between SafeModeMonitor and blockreport.
{noformat}
Thread 16142: (state = BLOCKED)
-
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanodeListForReport(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType)
@bci=0, line=4208 (Interpreted frame)
-
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumberOfDatanodes(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType)
@bci=2, line=4202 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumLiveDataNodes()
@bci=4, line=4198 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.needEnter()
@bci=17, line=4886 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.canLeave()
@bci=38, line=4878 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor.run()
@bci=27, line=5074 (Interpreted frame) - java.lang.Thread.run() @bci=11,
line=662 (Interpreted frame)
Thread 16126: (state = BLOCKED)
-
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.incrementSafeBlockCount(short)
@bci=0, line=4938 (Interpreted frame)
-
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.incrementSafeBlockCount(int)
@bci=14, line=5141 (Interpreted frame)
-
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addStoredBlock(org.apache.hadoop.hdfs.protocol.Block,
org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor,
org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor) @bci=1134, line=3749
(Interpreted frame)
-
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(org.apache.hadoop.hdfs.protocol.DatanodeID,
org.apache.hadoop.hdfs.protocol.BlockListAsLongs) @bci=316, line=3548
(Interpreted frame)
-
org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration,
long[]) @bci=70, line=978 (Interpreted frame)
- sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method,
java.lang.Object, java.lang.Object[]) @bci=0 (Interpreted frame)
- sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object,
java.lang.Object[]) @bci=87, line=39 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object,
java.lang.Object[]) @bci=6, line=25 (Interpreted frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[])
@bci=161, line=597 (Interpreted frame)
- org.apache.hadoop.ipc.RPC$Server.call(java.lang.Class,
org.apache.hadoop.io.Writable, long) @bci=74, line=578 (Interpreted frame)
- org.apache.hadoop.ipc.Server$Handler$1.run() @bci=31, line=1388 (Interpreted
frame)
- org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=1384 (Interpreted
frame)
-
java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
java.security.AccessControlContext) @bci=0 (Interpreted frame)
- javax.security.auth.Subject.doAs(javax.security.auth.Subject,
java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
-
org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
@bci=14, line=1122 (Interpreted frame) -
org.apache.hadoop.ipc.Server$Handler.run() @bci=205, line=1382 (Interpreted
frame)
{noformat}
> Namenode deadlock in branch-1
> -----------------------------
>
> Key: HDFS-3846
> URL: https://issues.apache.org/jira/browse/HDFS-3846
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Reporter: Tsz Wo (Nicholas), SZE
> Assignee: Brandon Li
>
> Jitendra found out the following problem:
> 1. Handler : Acquires namesystem lock waits on SafemodeInfo lock at
> SafeModeInfo.isOn()
> 2. SafemodeMonitor : Calls SafeModeInfo.canLeave() which is synchronized so
> SafemodeInfo lock is acquired, but this method also causes following call
> sequence needEnter() -> getNumLiveDataNodes() -> getNumberOfDatanodes() ->
> getDatanodeListForReport() -> getDatanodeListForReport() . The
> getDatanodeListForReport is synchronized with FSNamesystem lock.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira