Hello

We had come across one issue, where write is failed even 7 DN's are available 
due to network fault at one datanode which is LAST_IN_PIPELINE. It will be 
similar to HDFS-6937 .

Scenario : (DN3 has N/W Fault and Min repl=2).

Write pipeline:
DN1->DN2->DN3  => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad
DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad
....
And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more 
datanodes to construct the pipeline.

Thinking we can handle like below:

Instead of throwing IOException for ERROR_CHECKSUM ack from downstream, If we 
can send back the pipeline ack and client side we can replace both DN2 and DN3 
with new nodes as we can't decide on which is having network problem.


Please give you views the possible fix..


--Brahma Reddy Battula

Reply via email to