[jira] [Commented] (HDFS-6937) Another issue in handling checksum errors in write pipeline

Yongjun Zhang (JIRA) Thu, 26 May 2016 01:15:37 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301733#comment-15301733
 ]


Yongjun Zhang commented on HDFS-6937:
-------------------------------------

Attached a draft patch that reports

{code}
2016-05-25 11:35:20,639 [DataStreamer for file 
/tmp/testClientReportBadBlock/CorruptTwoOutOfThreeReplicas1 block 
BP-198184347-127.0.0.1-1464201303610:blk_1073741825_1001] WARN  
hdfs.DataStreamer (DataStreamer.java:handleBadDatanode(1400)) - Error Recovery 
for BP-198184347-127.0.0.1-1464201303610:blk_1073741825_1001 in pipeline 
[DatanodeInfoWithStorage[127.0.0.1:54392,DS-d6b01513-ac11-4fdf-99a1-fbb111d0f0c5,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:45555,DS-67174cd1-1f9c-46bc-9dea-8ece7190308d,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:55877,DS-8fe30be4-1244-4059-a567-bc156d49d01a,DISK]]:
 datanode 
0(DatanodeInfoWithStorage[127.0.0.1:54392,DS-d6b01513-ac11-4fdf-99a1-fbb111d0f0c5,DISK])
 is bad.
{code}
where 127.0.0.1:5439 is DN3 the reported example.

With the fix, we can see

{code}
2016-05-25 14:15:29,831 [DataXceiver for client 
DFSClient_NONMAPREDUCE_1623233781_1 at /127.0.0.1:59590 [Receiving block 
BP-1085730607-127.0.0.1-1464210923866:blk_1073741825_1001]] WARN  
datanode.DataNode (DataXceiver.java:determineFirstBadLink(494)) - Datanode 2 
got response for connect ack  from downstream datanode with firstbadlink as 
127.0.0.1:36300, however, the replica on the current Datanode host4:38574 is 
found to be corrupted, set the firstBadLink to this DataNode.
{code}
where host4:38574 is DN2 in the example. Thus it's reported bad in Error 
Recovery message:

{code}
2016-05-25 14:15:29,833 [DataStreamer for file 
/tmp/testClientReportBadBlock/CorruptTwoOutOfThreeReplicas1 block 
BP-1085730607-127.0.0.1-1464210923866:blk_1073741825_1001] WARN  
hdfs.DataStreamer (DataStreamer.java:handleBadDatanode(1400)) - Error Recovery 
for BP-1085730607-127.0.0.1-1464210923866:blk_1073741825_1001 in pipeline 
[DatanodeInfoWithStorage[127.0.0.1:38574,DS-a743f66a-3379-4a1e-82df-5f6f26815df8,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:38267,DS-32e47236-c29f-435b-995b-f6f4f2a86acc,DISK],
 
DatanodeInfoWithStorage[127.0.0.1:36300,DS-906d823d-0de5-40f1-9409-4c8c6d4edd08,DISK]]:
 datanode 
0(DatanodeInfoWithStorage[127.0.0.1:38574,DS-a743f66a-3379-4a1e-82df-5f6f26815df8,DISK])
 is bad.
{code}

I was able to see the fix constantly succeed in one debug version, and fail 
when removing the fix. However, before I post the patch now, I observed some 
intermittency in the unit test env, which is yet to understood. But I'd like to 
post the patch to let it rolling.

Hi [~cmccabe],

Would you please help taking a look? 

Thanks a lot.




> Another issue in handling checksum errors in write pipeline
> -----------------------------------------------------------
>
>                 Key: HDFS-6937
>                 URL: https://issues.apache.org/jira/browse/HDFS-6937
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs-client
>    Affects Versions: 2.5.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6937.001.patch
>
>
> Given a write pipeline:
> DN1 -> DN2 -> DN3
> DN3 detected cheksum error and terminate, DN2 truncates its replica to the 
> ACKed size. Then a new pipeline is attempted as
> DN1 -> DN2 -> DN4
> DN4 detects checksum error again. Later when replaced DN4 with DN5 (and so 
> on), it failed for the same reason. This led to the observation that DN2's 
> data is corrupted. 
> Found that the software currently truncates DN2's replca to the ACKed size 
> after DN3 terminates. But it doesn't check the correctness of the data 
> already written to disk.
> So intuitively, a solution would be, when downstream DN (DN3 here) found 
> checksum error, propagate this info back to upstream DN (DN2 here), DN2 
> checks the correctness of the data already written to disk, and truncate the 
> replica to to MIN(correctDataSize, ACKedSize).
> Found this issue is similar to what was reported by HDFS-3875, and the 
> truncation at DN2 was actually introduced as part of the HDFS-3875 solution. 
> Filing this jira for the issue reported here. HDFS-3875 was filed by 
> [~tlipcon]
> and found he proposed something similar there.
> {quote}
> if the tail node in the pipeline detects a checksum error, then it returns a 
> special error code back up the pipeline indicating this (rather than just 
> disconnecting)
> if a non-tail node receives this error code, then it immediately scans its 
> own block on disk (from the beginning up through the last acked length). If 
> it detects a corruption on its local copy, then it should assume that it is 
> the faulty one, rather than the downstream neighbor. If it detects no 
> corruption, then the faulty node is either the downstream mirror or the 
> network link between the two, and the current behavior is reasonable.
> {quote}
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-6937) Another issue in handling checksum errors in write pipeline

Reply via email to