[ 
https://issues.apache.org/jira/browse/HDFS-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762160#comment-17762160
 ] 

Bryan Beaudreault commented on HDFS-17179:
------------------------------------------

This is only a problem with ShortCircuitReads. With a normal read, it results 
in an Op.READ_BLOCK request to the DataNode. When the DataNode processes this 
request, it will similarly try to call BlockMetadataHeader.readHeader and will 
encounter a CorruptMetaHeaderException. It will handle that exception and 
report the block to the namenode 
[here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L4083].
 Here's an example stacktrace of that:
{code:java}
2023-09-02 21:44:04,414 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
xxxxxx:50010:DataXceiver error processing READ_BLOCK operation  src: 
/xxxxxx:4556 dst: /xxxxxx:50010
org.apache.hadoop.hdfs.server.datanode.CorruptMetaHeaderException: The block 
meta file header is corrupt
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:191)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:147)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readDataChecksum(BlockMetadataHeader.java:100)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:335)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:596)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
        at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:292)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.util.InvalidChecksumSizeException: The value 84 
does not map to a valid checksum Type
        at 
org.apache.hadoop.util.DataChecksum.mapByteToChecksumType(DataChecksum.java:190)
        at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:177)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:189)
        ... 8 more
2023-09-02 21:44:04,415 WARN 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
BP-1284711166-xxxxxx-1647379306520:blk_1403927728_330193265 on /mnt/hdfs/data 
{code}

> DFSInputStream should report CorruptMetaHeaderException as corruptBlock to 
> NameNode
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-17179
>                 URL: https://issues.apache.org/jira/browse/HDFS-17179
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>
> We've been running into some data corruption issues recently. When a 
> ChecksumException is thrown, DFSInputStream correctly reports the block to 
> the NameNode which triggers deletion and re-replication of the replica. It's 
> also possible that we fail to even read the meta header for constructing the 
> checksum. This gets thrown as CorruptMetaHeaderException which is not handled 
> by DFSInputStream. We should handle this similarly to ChecksumException. See 
> stacktrace:
>  
> {code:java}
> org.apache.hadoop.hdfs.server.datanode.CorruptMetaHeaderException: The block 
> meta file header is corrupt
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:133)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:129)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:618)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:545)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:786)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:723)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:483)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:715) 
> ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1160)
>  ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1132) 
> ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1128) 
> ~[hadoop-hdfs-client-3.3.1.jar:?]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  ~[?:?]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  ~[?:?]
>       at java.lang.Thread.run(Thread.java:829) ~[?:?]
> Caused by: org.apache.hadoop.util.InvalidChecksumSizeException: The value -75 
> does not map to a valid checksum Type
>       at 
> org.apache.hadoop.util.DataChecksum.mapByteToChecksumType(DataChecksum.java:190)
>  ~[hadoop-common-3.3.1.jar:?]
>       at 
> org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:159) 
> ~[hadoop-common-3.3.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:131)
>  ~[hadoop-hdfs-client-3.3.1.jar:?] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to