Bryan Beaudreault created HDFS-17179:
----------------------------------------
Summary: DFSInputStream should report CorruptMetaHeaderException
as corruptBlock to NameNode
Key: HDFS-17179
URL: https://issues.apache.org/jira/browse/HDFS-17179
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Bryan Beaudreault
Assignee: Bryan Beaudreault
We've been running into some data corruption issues recently. When a
ChecksumException is thrown, DFSInputStream correctly reports the block to the
NameNode which triggers deletion and re-replication of the replica. It's also
possible that we fail to even read the meta header for constructing the
checksum. This gets thrown as CorruptMetaHeaderException which is not handled
by DFSInputStream. We should handle this similarly to ChecksumException. See
stacktrace:
{code:java}
org.apache.hadoop.hdfs.server.datanode.CorruptMetaHeaderException: The block
meta file header is corrupt at
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:133)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:129)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:618)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:545)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:786)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:723)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:483)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:715)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1160)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1132)
~[hadoop-hdfs-client-3.3.1.jar:?] at
org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1128)
~[hadoop-hdfs-client-3.3.1.jar:?] at
java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
~[?:?] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?]Caused by:
org.apache.hadoop.util.InvalidChecksumSizeException: The value -75 does not map
to a valid checksum Type at
org.apache.hadoop.util.DataChecksum.mapByteToChecksumType(DataChecksum.java:190)
~[hadoop-common-3.3.1.jar:?] at
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:159)
~[hadoop-common-3.3.1.jar:?] at
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:131)
~[hadoop-hdfs-client-3.3.1.jar:?] {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]