NameNode hardware specs

Manish Shah Tue, 12 Aug 2008 11:25:29 -0700

Can someone help explain in a little more detail some of the reasonsfor the hardware specs that were recently added to the wiki for theNameNode. I guess i'm interested in learning how others have settledon these specs? Is it by observed behavior, or just recommended byother hadoop users?


- Use a good server with lots (15GB+) of RAM.

- why 15+ GBs? Do we allocate all memory to the NameNode? orjust allocate some number using -Xmx and leave the rest available sothe machine doesnt start swapping?


- Consider using fast RAID5 storage for keeping the index.
     - why RAID5?

- List more than one name node directory in the configuration, sothat multiple copies of the indices will be stored. As long as thedirectories are on separate disks, a single full disk will notcorrupt the index.

     - If running RAID 5, why is this necessary?

- Configure the name node to store one set of transaction logs on aseparate disk from the index.

     - why?

- Configure the name node to store another set of transaction logs toa network mounted disk.

     - why?

- Do not host DataNode, JobTracker or TaskTracker services on thesame system.- how much memory would the job tracker need? Does it use alot of CPU? In general, what are good specs for a job tracker machineand can the machine be shared with other services?

Thanks so much for the help. I think it would be hugely helpful forthe community to start describing their respective setups for hadoopclusters in more detail than just the config for datanodes andcluster size. I think we all want to be confident that we arespending money on the right machines to grow our cluster the right way.



Most appreciated,

- Manish
Co-Founder Rapleaf.com

We're looking for a product manager, sys admin, and softwareengineers...$10K referral award

NameNode hardware specs

Reply via email to