[
https://issues.apache.org/jira/browse/HDFS-9256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wang updated HDFS-9256:
------------------------------
Issue Type: Improvement (was: Sub-task)
Parent: (was: HDFS-8031)
> Erasure Coding: Improve failure handling of ECWorker striped block
> reconstruction
> ---------------------------------------------------------------------------------
>
> Key: HDFS-9256
> URL: https://issues.apache.org/jira/browse/HDFS-9256
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Rakesh R
> Assignee: Rakesh R
> Labels: hdfs-ec-3.0-nice-to-have
>
> As we know reconstruction of missed striped block is a costly operation, it
> involves the following steps:-
> step-1) read the data from minimum number of sources(remotely reading the
> data)
> step-2) decode data for the targets (CPU cycles)
> step-3) transfer the data to the targets(remotely writing the data)
> Assume there is a failure in step-3 due to target DN disconnected or dead
> etc. Presently {{ECWorker}} is skipping the failed DN and continue
> transferring data to the other targets. In the next round, it should again
> start the reconstruction operation from first step. Considering the cost of
> reconstruction, it would be good to give another chance to retry the failed
> operation. The idea of this jira is to disucss the possible approaches and
> implement it.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]