Ravi Phulari wrote:
Hello Roman ,
If you have huge cluster then its good to have JobTracker and NameNode running
on different machines .
If your cluster is small enough ( ~<20-30 machines ) then you can run
JobTracker and NameNode on same machines .
Again it depends on hardware configuration . Usually NameNode and Jobtracker
machines have higher configuration compared to data nodes.
It depends on how big is your cluster and how big is your HDFS data .
NameNode memory usage is directly proportional to the size of HDFS and number of
files/directories on HDFS. Each file/directory's metadata and inode information is
stored in NameNode namespace(stored in main memory) which is directly proportional to the
number of files and directories on HDFS . If you go by byte size used for storing
metadata of HDFS file stored in Namespace NameNode memory requirements can be summarized
as "10 million files require 4 GB of memory for NameNode"
For a small cluster you can have NameNode and JobTracker running on the same
machine .
I'd start off with two DNS entries "namenode" and "jobtracker" both
pointing to the same box. If you need to split the machines later, all
bookmarked URLs and configuration files will remain the same, which will
keep users happier.