[ 
https://issues.apache.org/jira/browse/HDFS-9256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-9256:
------------------------------
    Issue Type: Improvement  (was: Sub-task)
        Parent:     (was: HDFS-8031)

> Erasure Coding: Improve failure handling of ECWorker striped block 
> reconstruction
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-9256
>                 URL: https://issues.apache.org/jira/browse/HDFS-9256
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>              Labels: hdfs-ec-3.0-nice-to-have
>
> As we know reconstruction of missed striped block is a costly operation, it 
> involves the following steps:-
> step-1) read the data from minimum number of sources(remotely reading the 
> data)
> step-2) decode data for the targets (CPU cycles)
> step-3) transfer the data to the targets(remotely writing the data)
> Assume there is a failure in step-3 due to target DN disconnected or dead 
> etc. Presently {{ECWorker}} is skipping the failed DN and continue 
> transferring data to the other targets. In the next round, it should again 
> start the reconstruction operation from first step. Considering the cost of 
> reconstruction, it would be good to give another chance to retry the failed 
> operation. The idea of this jira is to disucss the possible approaches and 
> implement it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to