Thanks a lot for the reply. I will go over the scripts to see the details. And I am a little confused about which process starts the hdfs and mapreduce daemons, the ringmaster or the hodring?
And I am wondering how do the hadoop daemons work once they are up. Do they communicate in the same way without HOD (daemons talk to each other ) or everything has to go through ringmaster and hodring? Thanks a lot for the time and help! Boyu On Mon, Apr 12, 2010 at 10:52 PM, Kevin Van Workum <[email protected]>wrote: > On Mon, Apr 12, 2010 at 8:52 PM, Boyu Zhang <[email protected]> wrote: > > Hi Kevin, > > > > Sorry to bother again, I am wondering in order to get HOD to work, do we > > need to install all the prerequisite software like passwordless ssh? > Thanks > > a lot! > > SSH is not needed for HOD, it uses pbsdsh to launch processes on the nodes. > > HOD seems to be very sensitive about the python version: 2.4 and 2.6 > don't work, you need 2.5. > > HOD is a little more flexible with Java, 1.5 and 1.6 seem to both work > for me. Also, the most recent versions of Twisted and zope seem to be > fine. > > > > > Boyu > > > > On Tue, Apr 6, 2010 at 10:43 AM, Kevin Van Workum <[email protected] > >wrote: > > > >> Hello, > >> > >> I'm trying to setup hadoop on demand (HOD) on my cluster. I'm > >> currently unable to "allocate cluster". I'm starting hod with the > >> following command: > >> > >> /usr/local/hadoop-0.20.2/hod/bin/hod -c > >> /usr/local/hadoop-0.20.2/hod/conf/hodrc -t > >> /b/01/vanw/hod/hadoop-0.20.2.tar.gz -o "allocate ~/hod 3" > >> --ringmaster.log-dir=/tmp -b 4 > >> > >> The job starts on the nodes and I see the ringmaster running on the > >> MotherSuperior. The ringmaster-main.log file is created, but is empty. > >> I don't see any associated processes running on the other 2 nodes in > >> the job. > >> > >> The critical errors are as follows: > >> > >> [2010-04-06 10:34:13,630] CRITICAL/50 hadoop:298 - Failed to retrieve > >> 'hdfs' service address. > >> [2010-04-06 10:34:13,631] DEBUG/10 hadoop:631 - Cleaning up cluster id > >> 238366.jman, as cluster could not be allocated. > >> [2010-04-06 10:34:13,632] DEBUG/10 hadoop:635 - Calling rm.stop() > >> [2010-04-06 10:34:13,639] DEBUG/10 hadoop:637 - Returning from rm.stop() > >> [2010-04-06 10:34:13,639] CRITICAL/50 hod:401 - Cannot allocate > >> cluster /b/01/vanw/hod > >> [2010-04-06 10:34:14,149] DEBUG/10 hod:597 - return code: 7 > >> > >> The contents of the hodrc file is: > >> > >> [hod] > >> stream = True > >> java-home = /usr/local/jdk1.6.0_02 > >> cluster = orange > >> cluster-factor = 1.8 > >> xrs-port-range = 32768-65536 > >> debug = 4 > >> allocate-wait-time = 3600 > >> temp-dir = /tmp/hod > >> > >> [ringmaster] > >> register = True > >> stream = False > >> temp-dir = /tmp/hod > >> http-port-range = 8000-9000 > >> work-dirs = /tmp/hod/1,/tmp/hod/2 > >> xrs-port-range = 32768-65536 > >> debug = 4 > >> > >> [hodring] > >> stream = False > >> temp-dir = /tmp/hod > >> register = True > >> java-home = /usr/local/jdk1.6.0_02 > >> http-port-range = 8000-9000 > >> xrs-port-range = 32768-65536 > >> debug = 4 > >> > >> [resource_manager] > >> queue = dque > >> batch-home = /usr/local/torque-2.3.7 > >> id = torque > >> env-vars = > >> HOD_PYTHON_HOME=/usr/local/python-2.5.5/bin/python > >> > >> [gridservice-mapred] > >> external = False > >> pkgs = /usr/local/hadoop-0.20.2 > >> tracker_port = 8030 > >> info_port = 50080 > >> > >> [gridservice-hdfs] > >> external = False > >> pkgs = /usr/local/hadoop-0.20.2 > >> fs_port = 8020 > >> info_port = 50070 > >> > >> > >> Some other useful information: > >> Linux 2.6.18-128.7.1.el5 > >> Python 2.5.5 > >> Twisted 10.0.0 > >> zope 3.3.0 > >> java version "1.6.0_02" > >> > >> -- > >> Kevin Van Workum, PhD > >> Sabalcore Computing Inc. > >> Run your code on 500 processors. > >> Sign up for a free trial account. > >> www.sabalcore.com > >> 877-492-8027 ext. 11 > >> > > > > > > -- > Kevin Van Workum, PhD > Sabalcore Computing Inc. > Run your code on 500 processors. > Sign up for a free trial account. > www.sabalcore.com > 877-492-8027 ext. 11 >
