Sriram Rao wrote:
Does this read every block of every file from all replicas and verify
that the checksums are good?

Sriram

The DataBlockScanner thread on every datanode does this for you automatically. You can tune the rate it reads it, but it reads in all local blocks and compares the MD5 sums, deals with failures by reporting a list of failures to the namenode after the scan. After that, it's the namenode's problem how to deal with the corrupt block. In an ideal system, at least one non-corrupt copy of the block is still live

the configuration attribute dfs.datanode.scan.period.hours can tune the scan rate

Reply via email to