[ 
https://issues.apache.org/jira/browse/HADOOP-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611418#action_12611418
 ] 

dhruba borthakur commented on HADOOP-3709:
------------------------------------------

I think this code will make heartbeat processing a *real real* bottleneck in 
large clusters. Every heartbeat requires acquiring the global FSNamesystem 
lock. From this viewpoint, it is a *regression* and used to show up very easily 
on a 1000 node cluster. 

> Lock hierarchy violation in namenode while handling hearbeats
> -------------------------------------------------------------
>
>                 Key: HADOOP-3709
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3709
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> The heartbeat processing code recently got rearranged via HADOOP-3254. 
> FSNamesystem.handleHeartbeat acquires the hearbeat lock and then invoke 
> blockReportProcessed. This method tries to acquire the global FSNamesystem 
> lock. This is a lock hierarchy violation. This leads to deadlock.
> The heatbeat processing code should acquire only the heartbeat lock. It 
> should not acquire the global lock, otherwise heartprocessing become too 
> heavyweight.
> This code occurs only on trunk and not o 018 branch. Surprise!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to