Ravi Phulari wrote:
Hello Roman ,

If you have huge cluster then its good to have JobTracker and NameNode running 
on different machines .
If your cluster is small enough ( ~<20-30 machines ) then you can run 
JobTracker and NameNode on same machines .
Again it depends on hardware configuration . Usually  NameNode and Jobtracker 
machines have higher configuration compared to data nodes.


It depends on how big is your cluster and how big is your HDFS data .
NameNode memory usage  is directly proportional to the size  of HDFS and number of 
files/directories on HDFS.  Each file/directory's metadata and inode information is 
stored in NameNode namespace(stored in main memory) which is directly proportional to the 
number of files and directories on HDFS  . If you go by byte size used for storing 
metadata of HDFS file stored in Namespace  NameNode memory requirements can be summarized 
as  "10 million files require 4 GB of memory for NameNode"

For a small cluster you can have  NameNode and JobTracker running on the same 
machine .

I'd start off with two DNS entries "namenode" and "jobtracker" both pointing to the same box. If you need to split the machines later, all bookmarked URLs and configuration files will remain the same, which will keep users happier.

Reply via email to