Apologies in advance if there is a more specific dev list for HOD. I've written interfaces so that HOD can be used on Moab (w/Torque or any other resource manager) as well as SGE and would like to contribute them, but writing those raised a larger problem. At some point, daemons must be started on all of the nodes assigned to the job, this task is performed currently by pbsdsh, which is part of the Scheduler/torque module. However, this isn't generally the job of the resource manager (parallel starting), and so it makes things a bit ugly. The choice of what method to use for parallel starting (could be anything from mpiexec, pbsdsh, other dsh, ssh, sge scripts) should really be a separate configurable option instead of dependant on the resource manager choice.
However, before I started refactoring to move that functionality outside of the Scheduler module, I wanted to check in to see if this is something the HOD folks would be interested in and whether there is any current work on this already going on that I don't know about. Thanks, Nate
