Steve and Todd,
Thanks for the info, it is very helpful. I am going to start to it
set it up in this fashion, now that I have the cluster working
correctly:)
Good idea with the DNS entries, that will make it easier if I need to
move them to dedicated boxes.
-John
On Nov 11, 2009, at 6:50 AM, Steve Loughran wrote:
John Martyniak wrote:
Thanks Todd.
I wasn't sure if that is possible. But you pointed out an
important point and that is it is just NN and JT that would run
remotely.
So in order to do this would I just install the complete hadoop
instance on each one. And then would they be configed as masters?
yes. same config, just run the namenode and jobtracker alone
Or should NameNode and JobTracker run on the same machine? So
there would be one master.
they can do that, you may want to separate them later, it depends on
the size of the filesystem and the NN's memory requirements.
one trick here is to add a couple of DNS entries that initially
point to the same physical host -namenode and jobtracker; then if
you split the machines up, change the DNS entries and everyone who
reconnects gets the new machines -no need to edit every bookmark or
config file
So when I start the cluster would I start it from the NN/JT
machine. Could it also be started from any of the other cluster
members.
in a big cluster you would normally use some CM tool or init.d stuff
to start the processes
All datanodes block until the NN is live; all TTs block for the JT;
the JT (in 0.21+) blocks waiting for the filesystem.