GridEngine module for Hadoop on Demand

Daniel Templeton Thu, 07 May 2009 13:05:10 -0700

Hi,

I have a functioning module for Grid Engine for HoD, but some parts ofit are currently hard-coded to my workstation. In cleaning up thoseelements, I need some advice. Hopefully this is the right forum.

So, in the hodlib/NodePools/torque.py file, there's a runWorkers()method. In that method, it makes a single call to pbsdsh to start theNameNode, DataNodes, JobTracker, and TaskTracker. I know nada aboutTorque, so please tell me if I'm interpreting this correctly. It wouldappear that the pbsdsh somehow reads out of the environment how manyhodring processes it should start up and executes them remotely, andeach hodring then figures out what service it should run.

In Grid Engine, the rough equivalent of pbsdsh is qrsh. (I think.)With qrsh, the master assigns the HoD job a set of nodes, and I thenhave to step through that set of nodes and qrsh to each one to start thehodring services. As far as I can tell, the total number of hodringservices I need to start is 1 for the NameNode + 1 for the JobTracker +n for the DataNodes + m for the TaskTrackers. The thing that I'm notgrokking is how the hodrings know what services to start, and how Ishould be parceling them out across the nodes of the cluster. Should Ibe making sure I have two hodrings per node, one for the DataNode andone of the TaskTracker? If I were to go start a dozen hodrings, one oneach of a dozen machines, would they work out among themselves how manyshould be DataNodes and how many should be TaskTrackers?

One more thing. If the above is on the mark, that means you'reconsuming a queue slot for each DataNode unless you use an external hdfsservice. That seems like a waste of cluster resources since slots tendto correspond more to compute resources than I/O. I have to wonder ifit wouldn't be more efficient from a cluster perspective to have eachhodring start a DataNode and a TaskTracker. It would slightlyoversubscribe that job slot, but that may be better than grosslyundersubscribing two.


Thanks,
Daniel

GridEngine module for Hadoop on Demand

Reply via email to