John Martyniak wrote:
Thanks Todd.
I wasn't sure if that is possible. But you pointed out an important
point and that is it is just NN and JT that would run remotely.
So in order to do this would I just install the complete hadoop instance
on each one. And then would they be configed as masters?
yes. same config, just run the namenode and jobtracker alone
Or should NameNode and JobTracker run on the same machine? So there
would be one master.
they can do that, you may want to separate them later, it depends on the
size of the filesystem and the NN's memory requirements.
one trick here is to add a couple of DNS entries that initially point to
the same physical host -namenode and jobtracker; then if you split the
machines up, change the DNS entries and everyone who reconnects gets the
new machines -no need to edit every bookmark or config file
So when I start the cluster would I start it from the NN/JT machine.
Could it also be started from any of the other cluster members.
in a big cluster you would normally use some CM tool or init.d stuff to
start the processes
All datanodes block until the NN is live; all TTs block for the JT; the
JT (in 0.21+) blocks waiting for the filesystem.