I am starting an install of hadoop on a new cluster. However, I am a little confused what set of instructions I should follow, having only installed and played around with hadoop on a single node ubuntu box with 2 cores (on a single board) and 2 GB of RAM. The new machine has 2 internal nodes, each with 4 cores. I would like to run Hadoop to run in a distributed context over these 8 cores. One of my biggest issues is the definition of the word "node". From the Hadoop wiki and documentation, it seems that "node" means "machine", and not a board. So, by this definition, our cluster is really one "node". Is this correct?
If this is the case, then I shouldn't be using the "cluster setup" instructions, located here: http://hadoop.apache.org/core/docs/r0.17.2/cluster_setup.html But this one: http://hadoop.apache.org/core/docs/r0.17.2/quickstart.html Which is what I've been doing. But what should the operation be? I don't think it should be standalone. Should it be Psuedo-distributed? If so, how can I guarantee that it will be spread over all the 8 processors? What is necessary for the hadoop-site.xml file? Here are the specs of the machine. -Mac Pro RAID Card 065-7214 -Two 3.0GHz Quad-Core Intel Xeon (8-core) 065-7534 -16GB RAM (4 x 4GB) 065-7179 -1TB 7200-rpm Serial ATA 3Gb/s 065-7544 -1TB 7200-rpm Serial ATA 3Gb/s 065-7546 -1TB 7200-rpm Serial ATA 3Gb/s 065-7193 -1TB 7200-rpm Serial ATA 3Gb/s 065-7548 Could someone please point me to the correct mode of operation/instructions to install things correctly on this machine? I found some information how to install on a OS X machine in the archives, but they are a touch outdated and seems to be missing some things. Thank you very much for you time. -SM
