Are these really tiny files, or are you really storing 2M x 100MB = 200TB of data? Or do you have more like 2M x 10KB = 20GB of data?
Map-reduce and HDFS will generally work much better if you can arrange to have relatively larger files. On 7/15/07 8:04 AM, "erolagnab" <[EMAIL PROTECTED]> wrote: > > I have a HDFS with 2 datanodes and 1 namenode in 3 different machines, 2G ram > each. > Datanode A contains around 700,000 blocks and Datanode B contains 1,200,000+ > blocks, the namenode fails to start due to out of memory when trying to add > Datanode B into its rack. I have adjusted the java heap memory to 1600MB > which is the maxinum. But it still runs out of memory. > > AFAIK, namenode loads all blocks information into the memory. If so, then is > there anyway to estimate how much ram needed for a HDFS with given number of > blocks in each datanode?
