Bryan Beaudreault created HDFS-17179:
----------------------------------------

             Summary: DFSInputStream should report CorruptMetaHeaderException 
as corruptBlock to NameNode
                 Key: HDFS-17179
                 URL: https://issues.apache.org/jira/browse/HDFS-17179
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Bryan Beaudreault
            Assignee: Bryan Beaudreault


We've been running into some data corruption issues recently. When a 
ChecksumException is thrown, DFSInputStream correctly reports the block to the 
NameNode which triggers deletion and re-replication of the replica. It's also 
possible that we fail to even read the meta header for constructing the 
checksum. This gets thrown as CorruptMetaHeaderException which is not handled 
by DFSInputStream. We should handle this similarly to ChecksumException. See 
stacktrace:

 
{code:java}
org.apache.hadoop.hdfs.server.datanode.CorruptMetaHeaderException: The block 
meta file header is corrupt        at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:133)
 ~[hadoop-hdfs-client-3.3.1.jar:?]       at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:129)
 ~[hadoop-hdfs-client-3.3.1.jar:?]       at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:618)
 ~[hadoop-hdfs-client-3.3.1.jar:?]  at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:545)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:786)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:723)
 ~[hadoop-hdfs-client-3.3.1.jar:?]    at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:483)
 ~[hadoop-hdfs-client-3.3.1.jar:?]     at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:715) 
~[hadoop-hdfs-client-3.3.1.jar:?]      at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1160)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1132) 
~[hadoop-hdfs-client-3.3.1.jar:?]     at 
org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1128) 
~[hadoop-hdfs-client-3.3.1.jar:?]     at 
java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]      at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]  
     at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]      at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]       at java.lang.Thread.run(Thread.java:829) ~[?:?]Caused by: 
org.apache.hadoop.util.InvalidChecksumSizeException: The value -75 does not map 
to a valid checksum Type      at 
org.apache.hadoop.util.DataChecksum.mapByteToChecksumType(DataChecksum.java:190)
 ~[hadoop-common-3.3.1.jar:?]        at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:159) 
~[hadoop-common-3.3.1.jar:?]      at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:131)
 ~[hadoop-hdfs-client-3.3.1.jar:?] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to