[ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15759:
-----------------------------------
    Fix Version/s: 3.2.3

> EC: Verify EC reconstruction correctness on DataNode
> ----------------------------------------------------
>
>                 Key: HDFS-15759
>                 URL: https://issues.apache.org/jira/browse/HDFS-15759
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, ec, erasure-coding
>    Affects Versions: 3.4.0
>            Reporter: Toshihiko Uchida
>            Assignee: Toshihiko Uchida
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.3.1, 3.4.0, 3.2.3
>
>          Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to