Daniel Templeton wrote:
Hi,
I have a functioning module for Grid Engine for HoD, but some parts of
it are currently hard-coded to my workstation. In cleaning up those
elements, I need some advice. Hopefully this is the right forum.
So, in the hodlib/NodePools/torque.py file, there's a runWorkers()
method. In that method, it makes a single call to pbsdsh to start the
NameNode, DataNodes, JobTracker, and TaskTracker. I know nada about
Torque, so please tell me if I'm interpreting this correctly. It
would appear that the pbsdsh somehow reads out of the environment how
many hodring processes it should start up and executes them remotely,
and each hodring then figures out what service it should run.
Roughly right. In Torque, when a set of nodes are assigned to a job, the
first node in that list is special (it's called mother superior - MS),
the other nodes are called sisters. The job that's submitted to torque
is a HOD process called 'ringmaster'. The ringmaster starts on the MS
and invokes runWorkers which executes pbsdsh. AFAIK, pbsdsh reads the
environment and gets a 'nodes' file that Torque writes out. This file
contains all the sisters allocated for the job (including the MS). It
executes the command passed to pbsdsh - another HOD process, called
hodring - on all of these nodes. The Hodring processes work with the
ringmaster and decide which service to run. In a sense the ringmaster
coordinates which service to start where, and inform the hodring to
start that service.
In Grid Engine, the rough equivalent of pbsdsh is qrsh. (I think.)
With qrsh, the master assigns the HoD job a set of nodes, and I then
have to step through that set of nodes and qrsh to each one to start
the hodring services. As far as I can tell, the total number of
hodring services I need to start is 1 for the NameNode + 1 for the
JobTracker + n for the DataNodes + m for the TaskTrackers.
HOD has a facility to use a HDFS service that's started outside of HOD.
In that mode, it does not start NameNode or DataNodes. Also, the number
of DataNodes always equals the number of TaskTrackers (if HDFS services
are started with HOD).
The thing that I'm not grokking is how the hodrings know what services
to start, and how I should be parceling them out across the nodes of
the cluster.
This is decided by the ringmaster process. The logic is independent of
the resource manager in use, and hence need not be worried about when
porting to a new resource manager.
Should I be making sure I have two hodrings per node, one for the
DataNode and one of the TaskTracker?
No, a single hodring gets to start both the daemons.
If I were to go start a dozen hodrings, one on each of a dozen
machines, would they work out among themselves how many should be
DataNodes and how many should be TaskTrackers? One more thing. If the
above is on the mark, that means you're consuming a queue slot for
each DataNode unless you use an external hdfs service. That seems
like a waste of cluster resources since slots tend to correspond more
to compute resources than I/O. I have to wonder if it wouldn't be
more efficient from a cluster perspective to have each hodring start a
DataNode and a TaskTracker. It would slightly oversubscribe that job
slot, but that may be better than grossly undersubscribing two.
Explained above.
Thanks
Hemanth