On Thu, Apr 8, 2010 at 2:23 PM, Boyu Zhang <[email protected]> wrote: > Hi Kevin, > > I am having the same error, but my critical error is: > > [2010-04-08 13:47:25,304] CRITICAL/50 hadoop:303 - Cluster could not be > allocated because of the following errors. > Hodring at n0 failed with following errors: > JobTracker failed to initialise > > Have you solved this? Thanks!
Yes, I was about to post my solution. In my case the issue was that the default log-dir is to use the "log" directory under the HOD installation. Since I didn't have permissions to write to this directory, the hdfs couldn't initailize. Setting "log-dir = logs" for [hod], [ringmaster], [hodring], [gridservice-mapred], and [gridservice-hdfs] in hodrc fixed the problem by writing the logs to the "logs" directory under the CWD. Also, I have managed to get HOD to use the hod.cluster setting from hodrc to set the node properties for the qsub command. I'm going to clean up my modifications and post it in the next day or two. Kevin > > Boyu > > On Tue, Apr 6, 2010 at 11:32 AM, Kevin Van Workum <[email protected]>wrote: > >> [sorry for the double posting (to general), but I think this list is >> the appropriate place for this message] >> >> Hello, >> >> I'm trying to setup hadoop on demand (HOD) on my cluster. I'm >> currently unable to "allocate cluster". I'm starting hod with the >> following command: >> >> /usr/local/hadoop-0.20.2/hod/bin/hod -c >> /usr/local/hadoop-0.20.2/hod/conf/hodrc -t >> /b/01/vanw/hod/hadoop-0.20.2.tar.gz -o "allocate ~/hod 3" >> --ringmaster.log-dir=/tmp -b 4 >> >> The job starts on the nodes and I see the ringmaster running on the >> MotherSuperior. The ringmaster-main.log file is created and contains: >> >> [2010-04-06 11:18:29,036] DEBUG/10 ringMaster:487 - getServiceAddr >> service: <hodlib.GridServices.mapred.MapReduce instance at 0x12b42518> >> [2010-04-06 11:18:29,038] DEBUG/10 ringMaster:504 - getServiceAddr >> addr mapred: not found >> [2010-04-06 10:47:43,183] DEBUG/10 ringMaster:479 - getServiceAddr name: >> hdfs >> [2010-04-06 10:47:43,184] DEBUG/10 ringMaster:487 - getServiceAddr >> service: <hodlib.GridServices.hdfs.Hdfs instance at 0x122d24d0> >> [2010-04-06 10:47:43,186] DEBUG/10 ringMaster:504 - getServiceAddr >> addr hdfs: not found >> >> I don't see any associated processes running on the other 2 nodes in >> the job. >> >> The critical errors are as follows: >> >> [2010-04-06 10:34:13,630] CRITICAL/50 hadoop:298 - Failed to retrieve >> 'hdfs' service address. >> [2010-04-06 10:34:13,631] DEBUG/10 hadoop:631 - Cleaning up cluster id >> 238366.jman, as cluster could not be allocated. >> [2010-04-06 10:34:13,632] DEBUG/10 hadoop:635 - Calling rm.stop() >> [2010-04-06 10:34:13,639] DEBUG/10 hadoop:637 - Returning from rm.stop() >> [2010-04-06 10:34:13,639] CRITICAL/50 hod:401 - Cannot allocate >> cluster /b/01/vanw/hod >> [2010-04-06 10:34:14,149] DEBUG/10 hod:597 - return code: 7 >> >> The contents of the hodrc file is: >> >> [hod] >> stream = True >> java-home = /usr/local/jdk1.6.0_02 >> cluster = orange >> cluster-factor = 1.8 >> xrs-port-range = 32768-65536 >> debug = 4 >> allocate-wait-time = 3600 >> temp-dir = /tmp/hod >> >> [ringmaster] >> register = True >> stream = False >> temp-dir = /tmp/hod >> http-port-range = 8000-9000 >> work-dirs = /tmp/hod/1,/tmp/hod/2 >> xrs-port-range = 32768-65536 >> debug = 4 >> >> [hodring] >> stream = False >> temp-dir = /tmp/hod >> register = True >> java-home = /usr/local/jdk1.6.0_02 >> http-port-range = 8000-9000 >> xrs-port-range = 32768-65536 >> debug = 4 >> >> [resource_manager] >> queue = dque >> batch-home = /usr/local/torque-2.3.7 >> id = torque >> env-vars = >> HOD_PYTHON_HOME=/usr/local/python-2.5.5/bin/python >> >> [gridservice-mapred] >> external = False >> tracker_port = 8030 >> info_port = 50080 >> >> [gridservice-hdfs] >> external = False >> fs_port = 8020 >> info_port = 50070 >> >> >> Some other useful information: >> Linux 2.6.18-128.7.1.el5 >> Python 2.5.5 >> Twisted 10.0.0 >> zope 3.3.0 >> java version "1.6.0_02" >> hadoop version 0.20.2 >> >> >> >> -- >> Kevin Van Workum, PhD >> Sabalcore Computing Inc. >> Run your code on 500 processors. >> Sign up for a free trial account. >> www.sabalcore.com >> 877-492-8027 ext. 11 >> > -- Kevin Van Workum, PhD Sabalcore Computing Inc. Run your code on 500 processors. Sign up for a free trial account. www.sabalcore.com 877-492-8027 ext. 11
