Daniel Templeton wrote:
Thanks for the reply. Grid Engine works roughly the same way wrt
parallel jobs, except that we call the tasks the master and the
slaves. Grid Engine does not have a pbsdsh equivalent, but it would
be a really trivial wrapper script to write for qrsh, which is pbsdsh
minus the automatic use of the nodes files (called the pe_hostfile in
Grid Engine).
I assume from the 3-task minimum that the JobTracker gets a slot, the
NameNode gets a slot, and there has to be at least one slot running a
DataNode/TaskTracker. Correct?
Yes
Should a single job be prevented from running more than on hodring on
a single host?
More than one hodrings can be launched on a single host. However, this
means more than 1 instance of a slave would get launched - like 2
tasktrackers and 2 datanodes. In practice, we've seen that while this is
also OK, when we start running M/R tasks on such a system, it slows down
the system quite a bit. Hence I don't think this is really useful.
How do I go about contributing this Grid Engine extension to the HoD
source base?
Please feel free to submit a patch if you've figured out all the
details. It should be against the current code base. Please refer to
http://wiki.apache.org/hadoop/HowToContribute for details on contributing.
Thanks!
Hemanth
Hemanth Yamijala wrote:
Daniel Templeton wrote:
Hi,
I have a functioning module for Grid Engine for HoD, but some parts
of it are currently hard-coded to my workstation. In cleaning up
those elements, I need some advice. Hopefully this is the right forum.
So, in the hodlib/NodePools/torque.py file, there's a runWorkers()
method. In that method, it makes a single call to pbsdsh to start
the NameNode, DataNodes, JobTracker, and TaskTracker. I know nada
about Torque, so please tell me if I'm interpreting this correctly.
It would appear that the pbsdsh somehow reads out of the environment
how many hodring processes it should start up and executes them
remotely, and each hodring then figures out what service it should run.
Roughly right. In Torque, when a set of nodes are assigned to a job,
the first node in that list is special (it's called mother superior -
MS), the other nodes are called sisters. The job that's submitted to
torque is a HOD process called 'ringmaster'. The ringmaster starts on
the MS and invokes runWorkers which executes pbsdsh. AFAIK, pbsdsh
reads the environment and gets a 'nodes' file that Torque writes out.
This file contains all the sisters allocated for the job (including
the MS). It executes the command passed to pbsdsh - another HOD
process, called hodring - on all of these nodes. The Hodring
processes work with the ringmaster and decide which service to run.
In a sense the ringmaster coordinates which service to start where,
and inform the hodring to start that service.
In Grid Engine, the rough equivalent of pbsdsh is qrsh. (I think.)
With qrsh, the master assigns the HoD job a set of nodes, and I then
have to step through that set of nodes and qrsh to each one to start
the hodring services. As far as I can tell, the total number of
hodring services I need to start is 1 for the NameNode + 1 for the
JobTracker + n for the DataNodes + m for the TaskTrackers.
HOD has a facility to use a HDFS service that's started outside of
HOD. In that mode, it does not start NameNode or DataNodes. Also, the
number of DataNodes always equals the number of TaskTrackers (if HDFS
services are started with HOD).
The thing that I'm not grokking is how the hodrings know what
services to start, and how I should be parceling them out across the
nodes of the cluster.
This is decided by the ringmaster process. The logic is independent
of the resource manager in use, and hence need not be worried about
when porting to a new resource manager.
Should I be making sure I have two hodrings per node, one for the
DataNode and one of the TaskTracker?
No, a single hodring gets to start both the daemons.
If I were to go start a dozen hodrings, one on each of a dozen
machines, would they work out among themselves how many should be
DataNodes and how many should be TaskTrackers? One more thing. If
the above is on the mark, that means you're consuming a queue slot
for each DataNode unless you use an external hdfs service. That
seems like a waste of cluster resources since slots tend to
correspond more to compute resources than I/O. I have to wonder if
it wouldn't be more efficient from a cluster perspective to have
each hodring start a DataNode and a TaskTracker. It would slightly
oversubscribe that job slot, but that may be better than grossly
undersubscribing two.
Explained above.
Thanks
Hemanth