[ 
https://issues.apache.org/jira/browse/HDFS-11022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-11022:
-----------------------------------
    Attachment: HDFS-11022.png

Attach a diagram if it is easier for people to understand.

> DataNode unable to remove corrupt block replica due to race condition
> ---------------------------------------------------------------------
>
>                 Key: HDFS-11022
>                 URL: https://issues.apache.org/jira/browse/HDFS-11022
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.6.0
>         Environment: CDH5.7.0
>            Reporter: Wei-Chiu Chuang
>            Priority: Critical
>         Attachments: HDFS-11022.png
>
>
> Scenario:
> # A client reads a replica blk_A_x from a data node and detected corruption.
> # In the meantime, the replica is appended, updating its generation stamp 
> from x to y.
> # The client tells NN to mark the replica blk_A_x corrupt.
> # NN tells the data node to (1) delete replica blk_A_x and (2) replicate the 
> newer replica blk_A_y from another datanode. Due to block placement policy, 
> blk_A_y is replicated to the same node. (It's a small cluster)
> # DN is unable to receive the newer replica blk_A_y, because the replica 
> already exists.
> # DN is also unable to delete replica blk_A_y because blk_A_y does not exist.
> # The replica on the DN is not part of data pipeline, so it becomes stale.
> If another replica becomes corrupt and NameNode wants to replicate a healthy 
> replica to this DataNode, it can't, because a stale replica exists. Because 
> this is a small cluster, soon enough (in a matter of a hour) no DataNode is 
> able to receive a healthy replica.
> This cluster also suffers from HDFS-11019, so even though DataNode later 
> detected data corruption, it was unable to report to NameNode.
> Note that we are still investigating the root cause of the corruption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to