[ 
https://issues.apache.org/jira/browse/HDFS-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-4851:
------------------------------

    Resolution: Duplicate
        Status: Resolved  (was: Patch Available)

Dupe on commit of HDFS-5016, fixes the same deadlock
                
> Deadlock in pipeline recovery
> -----------------------------
>
>                 Key: HDFS-4851
>                 URL: https://issues.apache.org/jira/browse/HDFS-4851
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.0.4-alpha
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: hdfs-4851-1.patch
>
>
> Here's a deadlock scenario that cropped up during pipeline recovery, debugged 
> through jstacks. Todd tipped me off to this one.
> # Pipeline fails, client initiates recovery. We have the old leftover 
> DataXceiver, and a new one doing recovery.
> # New DataXceiver does {{recoverRbw}}, grabbing the {{FsDatasetImpl}} lock
> # Old DataXceiver is in {{BlockReceiver#computePartialChunkCrc}}, calls 
> {{FsDatasetImpl#getTmpInputStreams}} and blocks on the {{FsDatasetImpl}} lock.
> # New DataXceiver {{ReplicaInPipeline#stopWriter}}, interrupting the old 
> DataXceiver and then joining on it.
> # Boom, deadlock. New DX holds the {{FsDatasetImpl}} lock and is joining on 
> the old DX, which is in turn waiting on the {{FsDatasetImpl}} lock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to