On Mon, Nov 9, 2009 at 1:04 PM, John Martyniak <[email protected] > wrote:
> Thanks Todd. > > I wasn't sure if that is possible. But you pointed out an important point > and that is it is just NN and JT that would run remotely. > > So in order to do this would I just install the complete hadoop instance on > each one. And then would they be configed as masters? > > Or should NameNode and JobTracker run on the same machine? So there would > be one master. > > Either way. On all clusters but the largest, the NN and JT are not significant users of CPU. On medium size clusters they can start to use up multiple GBs of RAM. If you're using less than 30 nodes you can *probably* get by with one machine for both; I say probably because it depends on not just your total capacity but also the number of files you have. There are some rough sizing estimates if you google the archives for "CompressedOops" I think - someone did some measurements of the NN's memory requirements. > So when I start the cluster would I start it from the NN/JT machine. Could > it also be started from any of the other cluster members. > > It doesn't matter - Hadoop itself doesn't use SSH or anything. The daemons just all have to be started somehow. If you're using the Cloudera distribution with RPM/Deb you can use init scripts. If you prefer shell scripts and ssh you can use the provided start-all scripts, your own scripts, or something like pdssh or cap shell. If you're a masochist you can log into each node individually and start the daemons by hand. I do not recommend this last option :) > sorry for all of the seemingly basic questions, but want to get it right > the first time:) > Sure thing- we're here to help. -Todd > > > On Nov 9, 2009, at 1:11 PM, Todd Lipcon wrote: > > On Mon, Nov 9, 2009 at 7:20 AM, John Martyniak < >> [email protected] >> >>> wrote: >>> >> >> >>> Can the NameNode/DataNode & JobTracker/TaskTracker run on a server that >>> isn't part of the "cluster" meaning I would like to run it on a machine >>> that >>> wouldn't participate in the processing of data, and wouldn't participate >>> in >>> the HDFS data sharing, and would solely focus on the NameNode/DataNode & >>> JobTracker/TaskTracker tasks. >>> >>> >>> Yes, running the NN and the JT on servers that don't also run TT/DN is >> very >> common and recommended for clusters of more than maybe 5 nodes. >> >> -Todd >> > >
