Steve and Todd,

Thanks for the info, it is very helpful. I am going to start to it set it up in this fashion, now that I have the cluster working correctly:)

Good idea with the DNS entries, that will make it easier if I need to move them to dedicated boxes.

-John

On Nov 11, 2009, at 6:50 AM, Steve Loughran wrote:

John Martyniak wrote:
Thanks Todd.
I wasn't sure if that is possible. But you pointed out an important point and that is it is just NN and JT that would run remotely. So in order to do this would I just install the complete hadoop instance on each one. And then would they be configed as masters?

yes. same config, just run the namenode and jobtracker alone
Or should NameNode and JobTracker run on the same machine? So there would be one master.

they can do that, you may want to separate them later, it depends on the size of the filesystem and the NN's memory requirements.

one trick here is to add a couple of DNS entries that initially point to the same physical host -namenode and jobtracker; then if you split the machines up, change the DNS entries and everyone who reconnects gets the new machines -no need to edit every bookmark or config file


So when I start the cluster would I start it from the NN/JT machine. Could it also be started from any of the other cluster members.

in a big cluster you would normally use some CM tool or init.d stuff to start the processes

All datanodes block until the NN is live; all TTs block for the JT; the JT (in 0.21+) blocks waiting for the filesystem.




Reply via email to