Hello all:

I'm the administrator of an existing, shared HPC cluster.  The
majority of the users of this cluster use MPI jobs, and do not wish to
change to hadoop or other systems.  However, one of my users wishes to
run hadoop jobs on this cluster.

In order to accommodate the variety of users, our cluster operates
under the principal that ALL nodes are scheduled via the resource
manager (torque+maui).  No compute nodes are allowed to be used
without being requested via the scheduler.  This applies to anyone
whishing to run hadoop processes on this cluster as well.

To that end, I found HOD, which appears was designed to do just what I
needed: request nodes via the scheduler, set up a hadoop "cluster" on
those nodes, and then exit cleanly when done, returning the nodes to
the available pool.

Unfortunately, I've been having problems getting it to work.  For
testing purposes, I'm setting everything up in my home directory; once
I have a better grasp of what's required to make it work, I'll set it
up for other users.

The issue I'm presently running into appears to involve how resources
are requested via the resource manager.  My cluster consists of 24
8-core nodes and 6 16-core nodes.  Normally, when an MPI user requests
8 nodes, they're very likely to get one physical node with all 8 cores
allocated to that user/job.  There are additional constrains a user
can request when they request resources.  For example, they can
request nodes=8:ppn=8, which which en essence says, "give me 8 nodes,
each with 8 cores on that node".  Nodes can also be called out by
name.

Presently, hod is simply taking the -n specified on the command line
(number of nodes) and requesting that of torque.  So, if I request
fewer than 8 nodes, all of them end up on the same physical node, and
the process of setting up a hadoop cluster blows up (file already
exists errors).  If I request more than 8 nodes, then I end up with 2
physical nodes, and the startup appears to succeed (with mapreduce and
hdfs running on separate physical nodes).  Unfortunately, its still
trying to start that number of hadoop processes.

When I go to use that hadoop cluster, it can't communicate with it
(java exceptions when trying to connect to the IP:port of the
ringmaster).  I logged into that node and checked with netstat, and
sure enough, nothing was listening on that port.  I also checked the
port that the startup process spit out for the http status, and there
was nothing listening on that port either.  I'm guessing that the
multiple hadoop processes are still stepping on each other on the same
machine somehow.

So, I was trying to somehow trick/force hod to request resources from
the scheduler in a way such that each hadoop node is on a separate
physical node.  Unfortunately, I've been unable to do so, as it
appears hod overwrites any attempt to override the resource
specification of "node".  For example, I've tried:

hod allocate -d /home/kusznir/hadoop-test-1 -n 4
--resource_manager.options='l:nodes=compute-0-15+compute-0-16+compute-0-17+compute-0-18'

hod allocate -d /home/kusznir/hadoop-test-1 -n 4
--resource_manager.options='l:nodes=32'

hod allocate -d /home/kusznir/hadoop-test-1 -n 4
--resource_manager.options='l:nodes=4:ppn=8'

All with failure (and the last one hod refuses to accept).

So, how exactly does one make hod work on clusters with multiple cores?

--Jim

Reply via email to