Rakesh R created HDFS-9256:
------------------------------
Summary: Erasure Coding: Improve failure handling of ECWorker
striped block reconstruction
Key: HDFS-9256
URL: https://issues.apache.org/jira/browse/HDFS-9256
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Rakesh R
Assignee: Rakesh R
As we know reconstruction of missed striped block is a costly operation, it
involves the following steps:-
step-1) read the data from minimum number of sources(remotely reading the data)
step-2) decode data for the targets (CPU cycles)
step-3) transfer the data to the targets(remotely writing the data)
Assume there is a failure in step-3 due to target DN disconnected or dead etc.
Presently {{ECWorker}} is skipping the failed DN and continue transferring data
to the other targets. In the next round, it should again start the
reconstruction operation from first step. Considering the cost of
reconstruction, it would be good to give another chance to retry the failed
operation. The idea of this jira is to disucss the possible approaches and
implement it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)