On Thu, Oct 7, 2010 at 10:08 PM, Sean Bigdatafun <sean.bigdata...@gmail.com>wrote:
> Is there a pointer where I can find details of the write path in HDFS? In > particular, I'd like to get some technical figures describing the following > puzzle in my mind: > > * Is there a 64KB block-wise checksum within the 64MB blocks (as > described in Section 5.2 in the > GFS paper)? or HDFS keeps a whole-block (64 MB) wise single checksum? > Checksums are on 512 byte chunks. This is theoretically configurable, but I have a lot of doubts that it would actually work with a different value, since I've never heard of anyone changing it :) > > * It seems that HDFS' staging strategy," In fact, initially the HDFS > client caches the file data into a temporary local file. Application writes > are transparently redirected to this temporary local file" , is quite > different from the original GFS paper (see Section 2.3 of GFS paper "neither > client nor the chunkserver caches file data"). Can someone help me > understanding it ? > > People keep referencing this on the list, but it hasn't been that way in about 3 years :) Where do you see this, so we can update the docs? > * Both HDFS document and GFS paper mentioned that Namenode poll > Datanodes periodically (BlockReport) to get their most up-to-date > information. Can someone tell me what exact info "BlockReport" contain or > tell me the class name that I can look up in the Javadoc? > Look at the NameNode.java class - the block reports come in via RPC to there. > Plus, is the block-id treated as file name in the datanode's local > filesystem? > In the DN, each block is two files: blk_NNNNN and blk_NNNNN_GS.meta, where GS is a generation stamp. The meta file contains checksums. > Here is my guess-standing: > --- 1) I think the reason why losing Namenode metadata can cause HDFS > cluster data total loss is because "BlockReport" does not contain the > mapping between a HDFS filename and the block-ids (otherwise, the polled > data may be sufficient to reconstruct the overall HDFS metadata view), so > I'd like to understand more details. > Correct, the DNs have no concept of filename. > --- 2) Namenode's metadata contains "{filename, n-th block} --> > block-id", and serve as the final authority (from checkpoint and edit log). > But the metadata does not contain "block-id --> {machineA, machineB, > machineC}" -- instead, it waits for the BlockReport info from Datanodes. > Correct. You can move blocks between DNs while the NN is down and no one will be the wiser. -Todd -- Todd Lipcon Software Engineer, Cloudera