[ 
https://issues.apache.org/jira/browse/HADOOP-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657456#action_12657456
 ] 

shv edited comment on HADOOP-4904 at 12/17/08 9:25 AM:
-----------------------------------------------------------------------

Thanks Koji for detecting this.
Here's part of jstack trace.
{code}
"org.apache.hadoop.dfs.fsnamesystem$safemodemoni...@2b7f6b6d":
 at 
org.apache.hadoop.dfs.FSNamesystem.processMisReplicatedBlocks(FSNamesystem.java:2918)
 - waiting to lock <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
 at org.apache.hadoop.dfs.FSNamesystem.access$800(FSNamesystem.java:72)
 at 
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo.leave(FSNamesystem.java:3833)
 - locked <0x0000002d34fb4c80> (a 
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo)
 at 
org.apache.hadoop.dfs.FSNamesystem$SafeModeMonitor.run(FSNamesystem.java:4033)
 at java.lang.Thread.run(Thread.java:619)

"IPC Server handler 38 on 8020":
 at org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo.isOn(FSNamesystem.java:3796)
 - waiting to lock <0x0000002d34fb4c80> (a 
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo)
 at org.apache.hadoop.dfs.FSNamesystem.isInSafeMode(FSNamesystem.java:4068)
 at org.apache.hadoop.dfs.FSNamesystem.addStoredBlock(FSNamesystem.java:2820)
 - locked <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
 at org.apache.hadoop.dfs.FSNamesystem.processReport(FSNamesystem.java:2718)
 - locked <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
 at org.apache.hadoop.dfs.NameNode.blockReport(NameNode.java:613)
 at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

Found 1 deadlock.
{code}


      was (Author: shv):
    Thanks Koji for detecting this.
Here's part of jstack trace.

"org.apache.hadoop.dfs.fsnamesystem$safemodemoni...@2b7f6b6d":
 at 
org.apache.hadoop.dfs.FSNamesystem.processMisReplicatedBlocks(FSNamesystem.java:2918)
 - waiting to lock <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
 at org.apache.hadoop.dfs.FSNamesystem.access$800(FSNamesystem.java:72)
 at 
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo.leave(FSNamesystem.java:3833)
 - locked <0x0000002d34fb4c80> (a 
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo)
 at 
org.apache.hadoop.dfs.FSNamesystem$SafeModeMonitor.run(FSNamesystem.java:4033)
 at java.lang.Thread.run(Thread.java:619)

"IPC Server handler 38 on 8020":
 at org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo.isOn(FSNamesystem.java:3796)
 - waiting to lock <0x0000002d34fb4c80> (a 
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo)
 at org.apache.hadoop.dfs.FSNamesystem.isInSafeMode(FSNamesystem.java:4068)
 at org.apache.hadoop.dfs.FSNamesystem.addStoredBlock(FSNamesystem.java:2820)
 - locked <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
 at org.apache.hadoop.dfs.FSNamesystem.processReport(FSNamesystem.java:2718)
 - locked <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
 at org.apache.hadoop.dfs.NameNode.blockReport(NameNode.java:613)
 at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

Found 1 deadlock.
  
> Deadlock while leaving safe mode.
> ---------------------------------
>
>                 Key: HADOOP-4904
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4904
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.3
>            Reporter: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> {{SafeModeInfo.leave()}} acquires locks in an incorrect order, which causes 
> the deadlock.
> It first acquires the {{SafeModeInfo}} lock, then calls 
> {{FSNamesystem.processMisReplicatedBlocks()}}, which requires the global 
> {{FSNamesystem}} lock.
> It should be the other way around: first {{FSNamesystem}} lock, then 
> {{SafeModeInfo}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to