[ 
https://issues.apache.org/jira/browse/HDFS-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-10714:
---------------------------------
    Attachment: HADOOP-10714-01-draft.patch

Here is the Initial approach, based on the #2 mentioned by [~brahmareddy].

1. DN1->DN2->DN3 is the pipeline,
2. DN3 will get ChecksumException, and Sends the CHECKSUM Error Ack upstream 
and shuts itself down.
3. DN2 will receive the Ack, and before sending upstream, verifies its local 
replica's checksum.
4. If DN2 also found checksum error, then possibly DN1 also would have error. 
So DN2 also marks itself CHECKSUM_ERROR, and sends the reply upstream and shuts 
itself down.

So in this way all DNs replicas will be verified before Ack reaches client.

Please review and give suggestions.

> Issue in handling checksum errors in write pipeline when fault DN is 
> LAST_IN_PIPELINE
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-10714
>                 URL: https://issues.apache.org/jira/browse/HDFS-10714
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>
> We had come across one issue, where write is failed even 7 DN’s are available 
> due to network fault at one datanode which is LAST_IN_PIPELINE. It will be 
> similar to HDFS-6937 .
> Scenario : (DN3 has N/W Fault and Min repl=2).
> Write pipeline:
> DN1->DN2->DN3  => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad
> DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad
> ….
> And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more 
> datanodes to construct the pipeline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to