Wei-Chiu Chuang created HDFS-16161:
--------------------------------------
Summary: Corrupt block checksum is not reported to NameNode
Key: HDFS-16161
URL: https://issues.apache.org/jira/browse/HDFS-16161
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Reporter: Wei-Chiu Chuang
One of our user reported this error in the log:
{noformat}
2021-07-30 09:51:27,509 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
an02nphda5777.npa.bfsiplatform.com:1004:DataXceiver error processing READ_BLOCK
operation src: /10.30.10.68:35680 dst: /10.30.10.67:1004
java.lang.IllegalArgumentException: id=-46 out of range [0, 5)
at
org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
at
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
{noformat}
Analysis:
it looks like the first few bytes of checksum was bad. The first few bytes
determines the type of checksum (CRC32, CRC32C…etc).
if DN throws an IOException reading a block, it starts another thread to scan
the block. If the block is indeed bad, it tells NN it’s got a bad block. But
this is an IllegalArgumentException which is a RuntimeException not an IOE so
it’s not handled that way.
its’ a bug in the error handling code. It should be made more graceful.
Suggest: catch the IllegalArgumentException in
BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so that
DN catches the exception and perform the regular block scan check.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]