[ 
https://issues.apache.org/jira/browse/HDFS-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414165#comment-15414165
 ] 

Wei-Chiu Chuang commented on HDFS-8224:
---------------------------------------

Thanks for the patch! Overall it looks good to me. I just like to add a few 
nits:

* Can you move the test to TestDiskError.java? In this way, you do not need to 
make {{BlockScanner#markSuspectBlock}}, {{DataNode#setBlockScanner}} and 
{{DataNode#transferBlock}} public. I also think this is more nature to place 
this test in this test class.

* {code:title= InvalidChecksumSizeException.java}
 * Thrown when bytesPerChecksun field in the meta file is less than
 * or equal to 0.
{code}
To be more precise, the exception can also be thrown if the type is invalid.

* The following line should be removed.
{code:title= TestDataTransferProtocol.java}
//config.setLong(DFS_DATANODE_SCAN_PERIOD_HOURS_KEY, -1);
{code}

* Finally, could you add a comment here that basically says if the peer 
disconnects the block is already added to BlockScanner, so do not add to the 
scan queue again. However an InvalidChecksumSizeException is thrown because 
metafile is corrupt (caused by a flaky disk) and therefore add to scan queue 
here.
{code:title=DataNode.java}
      } catch (IOException ie) {
        if (ie instanceof InvalidChecksumSizeException) {
 
{code}

> Any IOException in DataTransfer#run() will run diskError thread even if it is 
> not disk error
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8224
>                 URL: https://issues.apache.org/jira/browse/HDFS-8224
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>             Fix For: 2.8.0
>
>         Attachments: HDFS-8224-trunk-1.patch, HDFS-8224-trunk-2.patch, 
> HDFS-8224-trunk.patch
>
>
> This happened in our 2.6 cluster.
> One of the block and its metadata file were corrupted.
> The disk was healthy in this case.
> Only the block was corrupt.
> Namenode tried to copy that block to another datanode but failed with the 
> following stack trace:
> 2015-04-20 01:04:04,421 
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@11319bc4] WARN 
> datanode.DataNode: DatanodeRegistration(a.b.c.d, 
> datanodeUuid=e8c5135c-9b9f-4d05-a59d-e5525518aca7, infoPort=1006, 
> infoSecurePort=0, ipcPort=8020, 
> storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571):Failed
>  to transfer BP-xxx-1351096255769:blk_2697560713_1107108863999 to 
> a1.b1.c1.d1:1004 got 
> java.io.IOException: Could not create DataChecksum of type 0 with 
> bytesPerChecksum 0
>         at 
> org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:125)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:175)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:140)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readDataChecksum(BlockMetadataHeader.java:102)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:287)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1989)
>         at java.lang.Thread.run(Thread.java:722)
> The following catch block in DataTransfer#run method will treat every 
> IOException as disk error fault and run disk errror
> {noformat}
> catch (IOException ie) {
>         LOG.warn(bpReg + ":Failed to transfer " + b + " to " +
>             targets[0] + " got ", ie);
>         // check if there are any disk problem
>         checkDiskErrorAsync();
>       } 
> {noformat}
> This block was never scanned by BlockPoolSliceScanner otherwise it would have 
> reported as corrupt block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to