HDFS has block checksums. Whenever a block is written to the datanodes, a checksum is calculated and written with the block to the datanodes' disks.
Whenever a block is requested, the block's checksum is verified against the stored checksum. If they don't match, that block is corrupt. But since there's additional replicas of the block, chances are high one block is matching the checksum. Corrupt blocks will be scheduled to be rereplicated. Also, to prevent bit rod, blocks are checked periodically (weekly by default, I believe, you can configure that period) in the background. Kai Am 25.06.2012 um 13:29 schrieb Rita: > Does Hadoop, HDFS in particular, do any sanity checks of the file before > and after balancing/copying/reading the files? We have 20TB of data and I > want to make sure after these operating are completed the data is still in > good shape. Where can I read about this? > > tia > > -- > --- Get your facts first, then you can distort them as you please.-- -- Kai Voigt k...@123.org