[
https://issues.apache.org/jira/browse/HDFS-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903359#comment-16903359
]
Stephen O'Donnell commented on HDFS-14706:
------------------------------------------
Uploaded an initial patch to see if it breaks any existing tests. This change
still needs some tests to prove these changes are OK, as I have only tested
manually so far.
> Checksums are not checked if block meta file is less than 7 bytes
> -----------------------------------------------------------------
>
> Key: HDFS-14706
> URL: https://issues.apache.org/jira/browse/HDFS-14706
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.3.0
> Reporter: Stephen O'Donnell
> Assignee: Stephen O'Donnell
> Priority: Major
> Attachments: HDFS-14706.001.patch
>
>
> If a block and its meta file are corrupted in a certain way, the corruption
> can go unnoticed by a client, causing it to return invalid data.
> The meta file is expected to always have a header of 7 bytes and then a
> series of checksums depending on the length of the block.
> If the metafile gets corrupted in such a way, that it is between zero and
> less than 7 bytes in length, then the header is incomplete. In
> BlockSender.java the logic checks if the metafile length is at least the size
> of the header and if it is not, it does not error, but instead returns a NULL
> checksum type to the client.
> https://github.com/apache/hadoop/blob/b77761b0e37703beb2c033029e4c0d5ad1dce794/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java#L327-L357
> If the client receives a NULL checksum client, it will not validate checksums
> at all, and even corrupted data will be returned to the reader. This means
> this corrupt will go unnoticed and HDFS will never repair it. Even the Volume
> Scanner will not notice the corruption as the checksums are silently ignored.
> Additionally, if the meta file does have enough bytes so it attempts to load
> the header, and the header is corrupted such that it is not valid, it can
> cause the datanode Volume Scanner to exit, which an exception like the
> following:
> {code}
> 2019-08-06 18:16:39,151 ERROR datanode.VolumeScanner:
> VolumeScanner(/tmp/hadoop-sodonnell/dfs/data,
> DS-7f103313-61ba-4d37-b63d-e8cf7d2ed5f7) exiting because of exception
> java.lang.IllegalArgumentException: id=51 out of range [0, 5)
> at
> org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
> at
> org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:173)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:139)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:153)
> at
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.loadLastPartialChunkChecksum(FsVolumeImpl.java:1140)
> at
> org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.loadLastPartialChunkChecksum(FinalizedReplica.java:157)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getPartialChunkChecksumForFinalized(BlockSender.java:451)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:266)
> at
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:446)
> at
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:558)
> at
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:633)
> 2019-08-06 18:16:39,152 INFO datanode.VolumeScanner:
> VolumeScanner(/tmp/hadoop-sodonnell/dfs/data,
> DS-7f103313-61ba-4d37-b63d-e8cf7d2ed5f7) exiting.
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]