[
https://issues.apache.org/jira/browse/HDFS-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wang updated HDFS-4851:
------------------------------
Resolution: Duplicate
Status: Resolved (was: Patch Available)
Dupe on commit of HDFS-5016, fixes the same deadlock
> Deadlock in pipeline recovery
> -----------------------------
>
> Key: HDFS-4851
> URL: https://issues.apache.org/jira/browse/HDFS-4851
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.0.4-alpha
> Reporter: Andrew Wang
> Assignee: Andrew Wang
> Attachments: hdfs-4851-1.patch
>
>
> Here's a deadlock scenario that cropped up during pipeline recovery, debugged
> through jstacks. Todd tipped me off to this one.
> # Pipeline fails, client initiates recovery. We have the old leftover
> DataXceiver, and a new one doing recovery.
> # New DataXceiver does {{recoverRbw}}, grabbing the {{FsDatasetImpl}} lock
> # Old DataXceiver is in {{BlockReceiver#computePartialChunkCrc}}, calls
> {{FsDatasetImpl#getTmpInputStreams}} and blocks on the {{FsDatasetImpl}} lock.
> # New DataXceiver {{ReplicaInPipeline#stopWriter}}, interrupting the old
> DataXceiver and then joining on it.
> # Boom, deadlock. New DX holds the {{FsDatasetImpl}} lock and is joining on
> the old DX, which is in turn waiting on the {{FsDatasetImpl}} lock.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira