[
https://issues.apache.org/jira/browse/HDFS-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106502#comment-15106502
]
Kai Zheng commented on HDFS-9661:
---------------------------------
Good catch and nice report!
The patch can solve the deadlock issue. Not sure if any other similar case like
this and how to prevent such deadlock cleanly.
Wonder if it's possible to consider a unified model for the lock here. For
operations similar to {{FsDatasetImpl#moveBlockAcrossStorage}} and
{{FsDatasetImpl#createRbw}}, they need to choose volume and obtain lock on
{{RoundRobinVolumeChoosingPolicy}}, then need to lock on {{FsDatasetImpl}} in
{{volume.getAvailable}}. So to avoid such deadlock situation, maybe in each
thread, before the operation, avoid any lock on FsDatasetImpl object; during
the operation, get lock on VolumeChoosingPolicy first.
> Deadlock in DN.FsDatasetImpl between moveBlockAcrossStorage and createRbw
> -------------------------------------------------------------------------
>
> Key: HDFS-9661
> URL: https://issues.apache.org/jira/browse/HDFS-9661
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2
> Reporter: ade
> Assignee: Vinayakumar B
> Fix For: 2.7.2
>
> Attachments: HDFS-9661.0.patch, hdfs-9661-jstack.gif.png
>
>
> We found a deadlock in dn.FsDatasetImpl between moveBlockAcrossStorage and
> createRbw from rpc call: replaceBlock/writeBlock. The dn's jstack result is
> !hdfs-9661-jstack.gif.png!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)