On Thu, Aug 25, 2011 at 1:34 PM, Sesha Kumar <sesha...@gmail.com> wrote:
> Hi all, > I am trying to get a good understanding of how Hadoop works, for my > undergraduate project. I have the following questions/doubts : > > 1. Why does namenode store the blockmap (block to datanode mapping) in the > main memory for all the files, even those that are not used? > > 2. Why cant namenode move out a part of the blockmap from main memory to a > secondary storage device, when free space in main memory becomes scarce ( > due to large number of files) ? > > 3. Why cant the blockmap be constructed when a file is requested (by a > client) and then be cached for later accesses? > Regarding my earlier post as mentioned above. >From what i've read and understood, 1. Namenode stores blockmaps for all the blocks in its main memory. This can be used to keep an up-to-date snapshot of total filesystem. But what i feel is this blockmap is not a constant data and hence storing it in main memory all the time can be avoided in order to save main memory space. On a request for a file from the client the blockmap details can be fetched. As the main memory space is a constraint for adding too many files to filesystem, like in case of small files, this approach can save space. Only the first fetch takes more time and after that we can have streaming data access. I want to know why this was not considered, or if considered, i want to know why it was not implemented? Am i missing anything obvious. All replies from namenode are for heartbeat signals. Am not sure bout the time trade-off. Will it be much bigger? Is initial time of access as much important as streaming access?