>From a code perspective, the Namenode and Datanode are in sync in all
critical matters. But there is a possibility that the request from a
Namenode to a Datanode to delete a block might not have been received by
the Datanode because of a bad connection. This means that there could be
a leakage of storage. However, the current processing of block reports
every hour is too heavy-weight a cost to solve this problem. My
assumption is to make block reports occur very infrequently (maybe once
every day).

But when blocks get removed from under the Datanode, we would like to
detect this situation as soon as possible. Thus, it makes sense to
compare the Datanode data structures with what is on the disk once every
hour or so.

Implementing partial incremental block reports as you suggested are good
too, but maybe we do not need it if we do the above. Since full block
reports will be sent only very rarely (maybe once every 1 day), maybe we
can live with the current implementation for the daily block reports?

Thanks,
dhruba

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 30, 2008 9:07 AM
To: core-user@hadoop.apache.org
Subject: Re: Block reports: memory vs. file system, and Dividing
offerService into 2 threads

dhruba Borthakur wrote:
> My current thinking is that "block report processing" should compare
the
> blkxxx files on disk with the data structure in the Datanode memory.
If
> and only if there is some discrepancy between these two, then a block
> report be sent to the Namenode. If we do this, then we will
practically
> get rid of 99% of block reports.

Doesn't this assume that the namenode and datanode are 100% in sync? 
Another purpose of block reports is to make sure that the namenode and 
datanode agree, since failed RPCs, etc. might have permitted them to 
slip out of sync.  Or are we now confident that these are never out of 
sync?  Perhaps we should start logging whenever a block report
surprises?

Long ago we talked of implementing partial, incremental block reports. 
We'd divide blockid space into 64 sections.  The datanode would ask the 
namenode for the hash of its block ids in a section.  Full block lists 
would then only be sent when the hash differs.  Both sides would 
maintain hashes of all sections in memory.  Then, instead of making a 
block report every hour, we'd make a 1/64 block id check every minute.

Doug

Reply via email to