[ 
https://issues.apache.org/jira/browse/HDFS-16161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-16161:
-----------------------------------
    Description: 
One of our user reported this error in the log:

{noformat}
2021-07-30 09:51:27,509 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
an02nphda5777.npa.bfsiplatform.com:1004:DataXceiver error processing READ_BLOCK 
operation  src: /10.30.10.68:35680 dst: /10.30.10.67:1004
java.lang.IllegalArgumentException: id=-46 out of range [0, 5)
        at 
org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
        at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
{noformat}


Analysis:
it looks like the first few bytes of checksum was bad. The first few bytes 
determines the type of checksum (CRC32, CRC32C…etc). But the block was never 
reported to NameNode and removed.

if DN throws an IOException reading a block, it starts another thread to scan 
the block. If the block is indeed bad, it tells NN it’s got a bad block. But 
this is an IllegalArgumentException which is a RuntimeException not an IOE so 
it’s not handled that way.

its’ a bug in the error handling code. It should be made more graceful.

Suggest: catch the IllegalArgumentException in 
BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so that 
DN catches the exception and perform the regular block scan check.

  was:
One of our user reported this error in the log:

{noformat}
2021-07-30 09:51:27,509 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
an02nphda5777.npa.bfsiplatform.com:1004:DataXceiver error processing READ_BLOCK 
operation  src: /10.30.10.68:35680 dst: /10.30.10.67:1004
java.lang.IllegalArgumentException: id=-46 out of range [0, 5)
        at 
org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
        at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
{noformat}

Analysis:
it looks like the first few bytes of checksum was bad. The first few bytes 
determines the type of checksum (CRC32, CRC32C…etc).

if DN throws an IOException reading a block, it starts another thread to scan 
the block. If the block is indeed bad, it tells NN it’s got a bad block. But 
this is an IllegalArgumentException which is a RuntimeException not an IOE so 
it’s not handled that way.

its’ a bug in the error handling code. It should be made more graceful.

Suggest: catch the IllegalArgumentException in 
BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so that 
DN catches the exception and perform the regular block scan check.


> Corrupt block checksum is not reported to NameNode
> --------------------------------------------------
>
>                 Key: HDFS-16161
>                 URL: https://issues.apache.org/jira/browse/HDFS-16161
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>
> One of our user reported this error in the log:
> {noformat}
> 2021-07-30 09:51:27,509 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> an02nphda5777.npa.bfsiplatform.com:1004:DataXceiver error processing 
> READ_BLOCK operation  src: /10.30.10.68:35680 dst: /10.30.10.67:1004
> java.lang.IllegalArgumentException: id=-46 out of range [0, 5)
>         at 
> org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
>         at 
> org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
> {noformat}
> Analysis:
> it looks like the first few bytes of checksum was bad. The first few bytes 
> determines the type of checksum (CRC32, CRC32C…etc). But the block was never 
> reported to NameNode and removed.
> if DN throws an IOException reading a block, it starts another thread to scan 
> the block. If the block is indeed bad, it tells NN it’s got a bad block. But 
> this is an IllegalArgumentException which is a RuntimeException not an IOE so 
> it’s not handled that way.
> its’ a bug in the error handling code. It should be made more graceful.
> Suggest: catch the IllegalArgumentException in 
> BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so 
> that DN catches the exception and perform the regular block scan check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to