[
https://issues.apache.org/jira/browse/HADOOP-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611418#action_12611418
]
dhruba borthakur commented on HADOOP-3709:
------------------------------------------
I think this code will make heartbeat processing a *real real* bottleneck in
large clusters. Every heartbeat requires acquiring the global FSNamesystem
lock. From this viewpoint, it is a *regression* and used to show up very easily
on a 1000 node cluster.
> Lock hierarchy violation in namenode while handling hearbeats
> -------------------------------------------------------------
>
> Key: HADOOP-3709
> URL: https://issues.apache.org/jira/browse/HADOOP-3709
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: dhruba borthakur
>
> The heartbeat processing code recently got rearranged via HADOOP-3254.
> FSNamesystem.handleHeartbeat acquires the hearbeat lock and then invoke
> blockReportProcessed. This method tries to acquire the global FSNamesystem
> lock. This is a lock hierarchy violation. This leads to deadlock.
> The heatbeat processing code should acquire only the heartbeat lock. It
> should not acquire the global lock, otherwise heartprocessing become too
> heavyweight.
> This code occurs only on trunk and not o 018 branch. Surprise!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.