Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by SteveLoughran: http://wiki.apache.org/hadoop/VirtualCluster The comment on the change is: more troublespots. ------------------------------------------------------------------------------ i. All machine's(both VM's and physical machines) public key are distributed to all "~/.ssh/authorized_keys" file. i. conf/hadoop-site.xml file is similar for all the machines. i. /etc/hosts file must contain all the machines(VM,Physical machine) IP and Hostname. - i. The local hostname entry in /etc/hosts must not point to 127.0.0.1 or any other loopback address (some laptop-friendly Unix distributions do this). It should be to the assigned IP address. + i. The local hostname entry in /etc/hosts must not point to 127.0.0.1 or any other loopback address (some laptop-friendly Linux distributions do this). It should be to the assigned IP address. i. conf/slaves must contain the hostname of all slaves including VM's and physical machine. i. conf/masters must contain only master's hostname. i. both conf/masters and conf/slaves files must be similar in all the participating machines. @@ -28, +28 @@ Here are things that can cause trouble. 1. Multiple virtual network adapters. It is simpler with one network adapter/node 1. Machines changing hostname/IPAddress on a reboot. For a long-lived virtual cluster you need stable machine names. + 1. Machines whose hostname doesn't match the hostname the network assigns it. It thinks it is "granton", the network thinks it is "dhcp-169-45", that being the name everything else talks to it by. + 1. Machines that think they have the same hostname. You get this if you clone VMs and don't rename them. 1. Pauses of an entire VM for 5-10s or longer. This happens when the virtual host is overloaded and your VM has been swapped out. Host less VMs, or have them ask for less memory. + 1. Wierd clock drift where it can even run backwards. Again, don't overload your machines. + 1. All redundant virtual servers (e.g. Namenode and secondary NN) being hosted on the same physical machine. At that point, you don't have redundancy or failover any more.
