[ 
https://issues.apache.org/jira/browse/HDFS-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440501#comment-13440501
 ] 

Brandon Li commented on HDFS-3846:
----------------------------------

One deadlock example is between SafeModeMonitor and blockreport. 

{noformat}
Thread 16142: (state = BLOCKED)
- 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDatanodeListForReport(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType)
 @bci=0, line=4208 (Interpreted frame)
- 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumberOfDatanodes(org.apache.hadoop.hdfs.protocol.FSConstants$DatanodeReportType)
 @bci=2, line=4202 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNumLiveDataNodes() 
@bci=4, line=4198 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.needEnter() 
@bci=17, line=4886 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.canLeave() 
@bci=38, line=4878 (Interpreted frame)
- org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeMonitor.run() 
@bci=27, line=5074 (Interpreted frame) - java.lang.Thread.run() @bci=11, 
line=662 (Interpreted frame)


Thread 16126: (state = BLOCKED)
- 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.incrementSafeBlockCount(short)
 @bci=0, line=4938 (Interpreted frame)
- 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.incrementSafeBlockCount(int)
 @bci=14, line=5141 (Interpreted frame)
- 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addStoredBlock(org.apache.hadoop.hdfs.protocol.Block,
 org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor, 
org.apache.hadoop.hdfs.server.namenode.DatanodeDescriptor) @bci=1134, line=3749 
(Interpreted frame)
- 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processReport(org.apache.hadoop.hdfs.protocol.DatanodeID,
 org.apache.hadoop.hdfs.protocol.BlockListAsLongs) @bci=316, line=3548 
(Interpreted frame)
- 
org.apache.hadoop.hdfs.server.namenode.NameNode.blockReport(org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration,
 long[]) @bci=70, line=978 (Interpreted frame)
- sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method, 
java.lang.Object, java.lang.Object[]) @bci=0 (Interpreted frame)
- sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, 
java.lang.Object[]) @bci=87, line=39 (Interpreted frame)
- sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, 
java.lang.Object[]) @bci=6, line=25 (Interpreted frame)
- java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) 
@bci=161, line=597 (Interpreted frame)
- org.apache.hadoop.ipc.RPC$Server.call(java.lang.Class, 
org.apache.hadoop.io.Writable, long) @bci=74, line=578 (Interpreted frame)
- org.apache.hadoop.ipc.Server$Handler$1.run() @bci=31, line=1388 (Interpreted 
frame)
- org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=1384 (Interpreted 
frame)
- 
java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
 java.security.AccessControlContext) @bci=0 (Interpreted frame)
- javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
- 
org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
 @bci=14, line=1122 (Interpreted frame) - 
org.apache.hadoop.ipc.Server$Handler.run() @bci=205, line=1382 (Interpreted 
frame)
{noformat}


                
> Namenode deadlock in branch-1
> -----------------------------
>
>                 Key: HDFS-3846
>                 URL: https://issues.apache.org/jira/browse/HDFS-3846
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Brandon Li
>
> Jitendra found out the following problem:
> 1. Handler : Acquires namesystem lock waits on SafemodeInfo lock at 
> SafeModeInfo.isOn()
> 2. SafemodeMonitor : Calls SafeModeInfo.canLeave() which is synchronized so 
> SafemodeInfo lock is acquired, but this method also causes following call 
> sequence needEnter() -> getNumLiveDataNodes() -> getNumberOfDatanodes() -> 
> getDatanodeListForReport() -> getDatanodeListForReport() . The 
> getDatanodeListForReport is synchronized with FSNamesystem lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to