[
https://issues.apache.org/jira/browse/HADOOP-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657456#action_12657456
]
Konstantin Shvachko commented on HADOOP-4904:
---------------------------------------------
Thanks Koji for detecting this.
Here's part of jstack trace.
"org.apache.hadoop.dfs.fsnamesystem$safemodemoni...@2b7f6b6d":
at
org.apache.hadoop.dfs.FSNamesystem.processMisReplicatedBlocks(FSNamesystem.java:2918)
- waiting to lock <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
at org.apache.hadoop.dfs.FSNamesystem.access$800(FSNamesystem.java:72)
at
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo.leave(FSNamesystem.java:3833)
- locked <0x0000002d34fb4c80> (a
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo)
at
org.apache.hadoop.dfs.FSNamesystem$SafeModeMonitor.run(FSNamesystem.java:4033)
at java.lang.Thread.run(Thread.java:619)
"IPC Server handler 38 on 8020":
at org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo.isOn(FSNamesystem.java:3796)
- waiting to lock <0x0000002d34fb4c80> (a
org.apache.hadoop.dfs.FSNamesystem$SafeModeInfo)
at org.apache.hadoop.dfs.FSNamesystem.isInSafeMode(FSNamesystem.java:4068)
at org.apache.hadoop.dfs.FSNamesystem.addStoredBlock(FSNamesystem.java:2820)
- locked <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
at org.apache.hadoop.dfs.FSNamesystem.processReport(FSNamesystem.java:2718)
- locked <0x0000002ada38f558> (a org.apache.hadoop.dfs.FSNamesystem)
at org.apache.hadoop.dfs.NameNode.blockReport(NameNode.java:613)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
Found 1 deadlock.
> Deadlock while leaving safe mode.
> ---------------------------------
>
> Key: HADOOP-4904
> URL: https://issues.apache.org/jira/browse/HADOOP-4904
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.18.3
> Reporter: Konstantin Shvachko
> Priority: Blocker
> Fix For: 0.18.3
>
>
> {{SafeModeInfo.leave()}} acquires locks in an incorrect order, which causes
> the deadlock.
> It first acquires the {{SafeModeInfo}} lock, then calls
> {{FSNamesystem.processMisReplicatedBlocks()}}, which requires the global
> {{FSNamesystem}} lock.
> It should be the other way around: first {{FSNamesystem}} lock, then
> {{SafeModeInfo}}.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.