On Nov 23, 2010, at 7:41 PM, Thanh Do wrote:

> sorry for digging up this old thread.
> 
> Brian, is this the reason you want to add a "data-level" scan
> to HDFS, as in HDFS-221.
> 
> It seems to me that a very rarely read block could
> be silently corrupted, because the DataBlockScanner
> never finish it scanning job in 3 weeks...
> 
> 

Why?  What if you restarted your datanode once every 2 weeks?  Last I checked, 
HDFS randomly assigned blocks to be verified throughout a time interval.  If 
you have too many blocks and an insufficient time interval, because HDFS also 
provides a rate limiting feature, you can easily come up with a case where 
blocks won't get verified.

The reason one wants a data-level scan is if the admin wants to manually verify 
that all copies of a file are good (well, "good" compared to the checksum... 
maybe the user corrupted it before uploading it :).  It'd be a great debugging 
tool to put site admin's minds at easy.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to