Hello Roman , If you have huge cluster then its good to have JobTracker and NameNode running on different machines . If your cluster is small enough ( ~<20-30 machines ) then you can run JobTracker and NameNode on same machines . Again it depends on hardware configuration . Usually NameNode and Jobtracker machines have higher configuration compared to data nodes.
It depends on how big is your cluster and how big is your HDFS data . NameNode memory usage is directly proportional to the size of HDFS and number of files/directories on HDFS. Each file/directory's metadata and inode information is stored in NameNode namespace(stored in main memory) which is directly proportional to the number of files and directories on HDFS . If you go by byte size used for storing metadata of HDFS file stored in Namespace NameNode memory requirements can be summarized as "10 million files require 4 GB of memory for NameNode" For a small cluster you can have NameNode and JobTracker running on the same machine . Regards, - Ravi On 7/20/09 6:25 PM, "roman kolcun" <[email protected]> wrote: Hello everyone, is there any performance difference (or any advantage / disadvantage) in colocating NameNode and JobTracker on the same node? Is it better to put them on different nodes or on the same one? Thank you for your answers. Yours Sincerely, Roman
