Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by SteveLoughran:
http://wiki.apache.org/hadoop/VirtualCluster

The comment on the change is:
new page on bringing up virtual clusters

New page:
= Virtual Clusters =

Hadoop Clusters can be created on Virtual Machines. This works, but performance 
(especially to virtualized disks) can be significantly slower than for physical 
machines.


== Process to set up a cluster using the Hadoop shell scripts ==

 i. All machines should be brought up to date, with the same version of Java, 
Hadoop, etc. 
 i. All machine's(both VM's and physical machines) public key are distributed 
to all "~/.ssh/authorized_keys" file.
 i. conf/hadoop-site.xml file is similar for all the machines.
 i. /etc/hosts file must contain all the machines(VM,Physical machine) IP and 
Hostname.
 i. The local hostname entry in /etc/hosts must not point to 127.0.0.1 or any 
other loopback address (some laptop-friendly Unix distributions do this). It 
should be to the assigned IP address.
 i. conf/slaves must contain the hostname of all slaves including VM's and 
physical machine.
 i. conf/masters must contain only master's hostname.
 i. both conf/masters and conf/slaves files must be similar in all the 
participating machines.
 i. The hadoop home directory should be same for all the machines like 
"/root/hadoop-0.18.2/"
 i. Finally execute the Namenode -format command only in the Master machine.


Most of this configuration can be done on a single machine image, the 
problematic area is host naming. Virtual networks with reverseDNS allow the 
hostname to be picked up from the network. If you use this, 
 1. The namenode and job tracker hostnames should remain constant; these need 
separate images.
 1. Bring up other machines afterwards; use an empty conf/slaves file. 

== Trouble Areas ==

Here are things that can cause trouble.
 1. Multiple virtual network adapters. It is simpler with one network 
adapter/node
 1. Machines changing hostname/IPAddress on a reboot. For a long-lived virtual 
cluster you need stable machine names.
 1. Pauses of an entire VM for 5-10s or longer. This happens when the virtual 
host is overloaded and your VM has been swapped out. Host less VMs, or have 
them ask for less memory.

Reply via email to