Hi Brian, If this is the case, then is there any chance that, some how the DataBlockScanner cannot finishes the verification for all the block in three weeks (e.g, a node has a very large number of blocks)?
Thanh On Wed, Oct 13, 2010 at 7:18 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote: > Hi Thanh, > > That is correct. Last time I read the code, Hadoop scheduled the block > verifications randomly throughout the period in order to avoid periodic > effects (i.e., high load every N minutes). > > Brian > > On Oct 13, 2010, at 7:14 PM, Thanh Do wrote: > > > Brian, > > > > When you say *attempt* to complete and *entire* node scan, > > you mean for example, if a node has 100 block files, it will > > try to verify all 100 block every 3 weeks? > > That is in average, a block is scanned every (3 weeks / 100 time > interval)? > > > > Thanks > > Thanh > > > > > > On Wed, Oct 13, 2010 at 7:07 PM, Brian Bockelman <bbock...@cse.unl.edu > >wrote: > > > >> Hi Thanh, > >> > >> The scan period is the period that hadoop *attempts* to complete an > entire > >> node scan. That is, if it's set to 3 weeks, HDFS will try to scan each > >> block once every 3 weeks. > >> > >> Obviously, depending on the bandwidth you have made available to the > >> scanning thread, you can specify impossibly small periods. > >> > >> Brian > >> > >> On Oct 13, 2010, at 7:01 PM, Thanh Do wrote: > >> > >>> Hi again, > >>> > >>> Could any body explain to me about the scanning period > >>> policy of DataBlockScanner? That is who often it wake up > >>> and scan a block file. > >>> When looking at the code, I found > >>> > >>> static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks > >>> > >>> > >>> but definitely it does not wake up and pick a random block > >>> to verify every three weeks, right? > >>> > >>> Thanks a lot, > >>> Thanh > >> > >> > >