Hi Master: can you explain more detail --- "The only way to avoid this is to make the data much more cacheable and to have a viable cache coherency strategy. Cache coherency at the meta-data level is difficult. Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes" why "Cache coherency at the meta-data level is difficult" ? why "Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes" thanks a lotttttttttttttttttttttttttttttt! kanghua
Date: Mon, 5 Sep 2011 21:52:53 -0700 Subject: Re: Regarding design of HDFS From: dhr...@gmail.com To: hdfs-user@hadoop.apache.org My answers inline. 1. Why does namenode store the blockmap (block to datanode mapping) in the main memory for all the files, even those that are not used? The block to datanode mapping is needed for two reasons: when a client wants to read a file, the namenode has to tell the client the locations of the blocks that make up the file. Also, when a datanode dies, the namenode has to quickly find the blocks that resided on that datanode so that it can re-replicate those blocks. 2. Why cant namenode move out a part of the blockmap from main memory to a secondary storage device, when free space in main memory becomes scarce ( due to large number of files) ?3. Why cant the blockmap be constructed when a file is requested (by a client) and then be cached for later accesses? Both of the above can be done if needed. But when there is a better way to scale, why do this? Please look at my comments below. The only way to avoid this is to make the data much more cacheable and to have a viable cache coherency strategy. Cache coherency at the meta-data level is difficult. Cache coherency at the block level is also difficult (but not as difficult) because many blocks get moved for balance purposes. I would argue that a federation-model is much more scalable, elegant and easier to maintain. It takes a very well-oiled building block like the NameNode and allows you to use multitudes of them in a single This is already part of HDFS trunk code base. thanksdhruba