>From a code perspective, the Namenode and Datanode are in sync in all critical matters. But there is a possibility that the request from a Namenode to a Datanode to delete a block might not have been received by the Datanode because of a bad connection. This means that there could be a leakage of storage. However, the current processing of block reports every hour is too heavy-weight a cost to solve this problem. My assumption is to make block reports occur very infrequently (maybe once every day).
But when blocks get removed from under the Datanode, we would like to detect this situation as soon as possible. Thus, it makes sense to compare the Datanode data structures with what is on the disk once every hour or so. Implementing partial incremental block reports as you suggested are good too, but maybe we do not need it if we do the above. Since full block reports will be sent only very rarely (maybe once every 1 day), maybe we can live with the current implementation for the daily block reports? Thanks, dhruba -----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 30, 2008 9:07 AM To: core-user@hadoop.apache.org Subject: Re: Block reports: memory vs. file system, and Dividing offerService into 2 threads dhruba Borthakur wrote: > My current thinking is that "block report processing" should compare the > blkxxx files on disk with the data structure in the Datanode memory. If > and only if there is some discrepancy between these two, then a block > report be sent to the Namenode. If we do this, then we will practically > get rid of 99% of block reports. Doesn't this assume that the namenode and datanode are 100% in sync? Another purpose of block reports is to make sure that the namenode and datanode agree, since failed RPCs, etc. might have permitted them to slip out of sync. Or are we now confident that these are never out of sync? Perhaps we should start logging whenever a block report surprises? Long ago we talked of implementing partial, incremental block reports. We'd divide blockid space into 64 sections. The datanode would ask the namenode for the hash of its block ids in a section. Full block lists would then only be sent when the hash differs. Both sides would maintain hashes of all sections in memory. Then, instead of making a block report every hour, we'd make a 1/64 block id check every minute. Doug