[
https://issues.apache.org/jira/browse/HDFS-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948186#comment-14948186
]
Hudson commented on HDFS-9137:
------------------------------
FAILURE: Integrated in Hadoop-Mapreduce-trunk #2441 (See
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2441/])
HDFS-9137. DeadLock between DataNode#refreshVolumes and (yliu: rev
35affec38e17e3f9c21d36be111171476072c03f)
*
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
> DeadLock between DataNode#refreshVolumes and
> BPOfferService#registrationSucceeded
> ----------------------------------------------------------------------------------
>
> Key: HDFS-9137
> URL: https://issues.apache.org/jira/browse/HDFS-9137
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.7.1
> Reporter: Uma Maheswara Rao G
> Assignee: Uma Maheswara Rao G
> Fix For: 2.8.0
>
> Attachments: HDFS-9137.00.patch,
> HDFS-9137.01-WithPreservingRootExceptions.patch, HDFSS-9137.02.patch
>
>
> I can see this code flows between DataNode#refreshVolumes and
> BPOfferService#registrationSucceeded could cause deadLock.
> In practice situation may be rare as user calling refreshVolumes at the time
> DN registration with NN. But seems like issue can happen.
> Reason for deadLock:
> 1) refreshVolumes will be called with DN lock and after at the end it will
> also trigger Block report. In the Block report call,
> BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes
> readLock on bpos.
> DN lock then boos lock
> 2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and
> calling dn.bpRegistrationSucceeded which is again synchronized call on DN.
> bpos lock and then DN lock.
> So, this can clearly create dead lock.
> I think simple fix could be to move triggerBlockReport call outside out DN
> lock and I feel that call may not be really needed inside DN lock.
> Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)