Hi yongjun, Thanks a lot for your reply.. Yes, It really N/W issue..will raise new jira.
Thanks And RegardsBrahma Reddy Battula > From: yzh...@cloudera.com > Date: Sat, 30 Jul 2016 10:22:19 -0700 > Subject: Re: Issue in handling checksum errors in write pipeline > To: brahmareddy.batt...@huawei.com > CC: hdfs-dev@hadoop.apache.org > > Hi Brahma, > > Thanks for reporting the issue. > > If your problem is really a network issue, then your proposed solution > sounds reasonable to me, and it's different than what HDFS-6937 intends to > solve. I think we can create a new jira for your issue. Here is why: > > HDFS-6937's scenario is that we keep replacing the third node in recovery, > and did not detect that the middle node is corrupt. Thus adding a > corruption checking for the middle node would solve the issue; In your > case, even if we try to check the middle node, it would appear as not > corrupt. The problem is that, we don't have a check for network issue (and > probably adding a network check may not be feasible here). > > On the other hand, if it's not a network issue, then it could be caused by > HDFS-4660, if you don't already have the fix. > > Hope my explanation makes sense. > > Thanks. > > --Yongjun > > On Sat, Jul 30, 2016 at 4:03 AM, Brahma Reddy Battula < > brahmareddy.batt...@huawei.com> wrote: > > > Hello > > > > > > We had come across one issue, where write is failed even 7 DN's are > > available due to network fault at one datanode which is LAST_IN_PIPELINE. > > It will be similar to HDFS-6937 . > > > > Scenario : (DN3 has N/W Fault and Min repl=2). > > > > Write pipeline: > > DN1->DN2->DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad > > DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad > > .... > > And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no > > more datanodes to construct the pipeline. > > > > Thinking we can handle like below: > > > > Instead of throwing IOException for ERROR_CHECKSUM ack from downstream, If > > we can send back the pipeline ack and client side we can replace both DN2 > > and DN3 with new nodes as we can't decide on which is having network > > problem. > > > > > > Please give you views the possible fix.. > > > > > > --Brahma Reddy Battula > > > >