[sorry for the double posting (to general), but I think this list is
the appropriate place for this message]

Hello,

I'm trying to setup hadoop on demand (HOD) on my cluster. I'm
currently unable to "allocate cluster". I'm starting hod with the
following command:

/usr/local/hadoop-0.20.2/hod/bin/hod -c
/usr/local/hadoop-0.20.2/hod/conf/hodrc -t
/b/01/vanw/hod/hadoop-0.20.2.tar.gz -o "allocate ~/hod 3"
--ringmaster.log-dir=/tmp -b 4

The job starts on the nodes and I see the ringmaster running on the
MotherSuperior. The ringmaster-main.log file is created and contains:

[2010-04-06 11:18:29,036] DEBUG/10 ringMaster:487 - getServiceAddr
service: <hodlib.GridServices.mapred.MapReduce instance at 0x12b42518>
[2010-04-06 11:18:29,038] DEBUG/10 ringMaster:504 - getServiceAddr
addr mapred: not found
[2010-04-06 10:47:43,183] DEBUG/10 ringMaster:479 - getServiceAddr name: hdfs
[2010-04-06 10:47:43,184] DEBUG/10 ringMaster:487 - getServiceAddr
service: <hodlib.GridServices.hdfs.Hdfs instance at 0x122d24d0>
[2010-04-06 10:47:43,186] DEBUG/10 ringMaster:504 - getServiceAddr
addr hdfs: not found

I don't see any associated processes running on the other 2 nodes in
the job.

The critical errors are as follows:

[2010-04-06 10:34:13,630] CRITICAL/50 hadoop:298 - Failed to retrieve
'hdfs' service address.
[2010-04-06 10:34:13,631] DEBUG/10 hadoop:631 - Cleaning up cluster id
238366.jman, as cluster could not be allocated.
[2010-04-06 10:34:13,632] DEBUG/10 hadoop:635 - Calling rm.stop()
[2010-04-06 10:34:13,639] DEBUG/10 hadoop:637 - Returning from rm.stop()
[2010-04-06 10:34:13,639] CRITICAL/50 hod:401 - Cannot allocate
cluster /b/01/vanw/hod
[2010-04-06 10:34:14,149] DEBUG/10 hod:597 - return code: 7

The contents of the hodrc file is:

[hod]
stream                          = True
java-home                       = /usr/local/jdk1.6.0_02
cluster                         = orange
cluster-factor                  = 1.8
xrs-port-range                  = 32768-65536
debug                           = 4
allocate-wait-time              = 3600
temp-dir                        = /tmp/hod

[ringmaster]
register                        = True
stream                          = False
temp-dir                        = /tmp/hod
http-port-range                 = 8000-9000
work-dirs                       = /tmp/hod/1,/tmp/hod/2
xrs-port-range                  = 32768-65536
debug                           = 4

[hodring]
stream                          = False
temp-dir                        = /tmp/hod
register                        = True
java-home                       = /usr/local/jdk1.6.0_02
http-port-range                 = 8000-9000
xrs-port-range                  = 32768-65536
debug                           = 4

[resource_manager]
queue                           = dque
batch-home                      = /usr/local/torque-2.3.7
id                              = torque
env-vars                       =
HOD_PYTHON_HOME=/usr/local/python-2.5.5/bin/python

[gridservice-mapred]
external                        = False
tracker_port                    = 8030
info_port                       = 50080

[gridservice-hdfs]
external                        = False
fs_port                         = 8020
info_port                       = 50070


Some other useful information:
Linux 2.6.18-128.7.1.el5
Python 2.5.5
Twisted 10.0.0
zope 3.3.0
java version "1.6.0_02"
hadoop version 0.20.2



-- 
Kevin Van Workum, PhD
Sabalcore Computing Inc.
Run your code on 500 processors.
Sign up for a free trial account.
www.sabalcore.com
877-492-8027 ext. 11

Reply via email to