Oh, now i see the problem. The implication here is that some blocks might not be scanned for every long time, because the scanner may not finish scan all the blocks during 3 weeks, then after that, it start over again, ...
Interesting, thanks for prompt reply, Brian. Thanh On Wed, Oct 13, 2010 at 7:37 PM, Brian Bockelman <bbock...@cse.unl.edu>wrote: > > On Oct 13, 2010, at 7:29 PM, Thanh Do wrote: > > > Hi Brian, > > > > If this is the case, then is there any chance that, > > some how the DataBlockScanner cannot finishes > > the verification for all the block in three weeks > > (e.g, a node has a very large number of blocks)? > > > > Yes. At some point, I'd really like to figure out what percentage of our > blocks actually get scanned at our site, I suspect some go very long without > a scan. > > Brian > > > Thanh > > > > On Wed, Oct 13, 2010 at 7:18 PM, Brian Bockelman <bbock...@cse.unl.edu > >wrote: > > > >> Hi Thanh, > >> > >> That is correct. Last time I read the code, Hadoop scheduled the block > >> verifications randomly throughout the period in order to avoid periodic > >> effects (i.e., high load every N minutes). > >> > >> Brian > >> > >> On Oct 13, 2010, at 7:14 PM, Thanh Do wrote: > >> > >>> Brian, > >>> > >>> When you say *attempt* to complete and *entire* node scan, > >>> you mean for example, if a node has 100 block files, it will > >>> try to verify all 100 block every 3 weeks? > >>> That is in average, a block is scanned every (3 weeks / 100 time > >> interval)? > >>> > >>> Thanks > >>> Thanh > >>> > >>> > >>> On Wed, Oct 13, 2010 at 7:07 PM, Brian Bockelman <bbock...@cse.unl.edu > >>> wrote: > >>> > >>>> Hi Thanh, > >>>> > >>>> The scan period is the period that hadoop *attempts* to complete an > >> entire > >>>> node scan. That is, if it's set to 3 weeks, HDFS will try to scan > each > >>>> block once every 3 weeks. > >>>> > >>>> Obviously, depending on the bandwidth you have made available to the > >>>> scanning thread, you can specify impossibly small periods. > >>>> > >>>> Brian > >>>> > >>>> On Oct 13, 2010, at 7:01 PM, Thanh Do wrote: > >>>> > >>>>> Hi again, > >>>>> > >>>>> Could any body explain to me about the scanning period > >>>>> policy of DataBlockScanner? That is who often it wake up > >>>>> and scan a block file. > >>>>> When looking at the code, I found > >>>>> > >>>>> static final long DEFAULT_SCAN_PERIOD_HOURS = 21*24L; // three weeks > >>>>> > >>>>> > >>>>> but definitely it does not wake up and pick a random block > >>>>> to verify every three weeks, right? > >>>>> > >>>>> Thanks a lot, > >>>>> Thanh > >>>> > >>>> > >> > >> > >