[
https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nate Woody updated HADOOP-5441:
-------------------------------
Attachment: (was: HOD_patch2)
> HOD refactoring to ease integration with scheduler/resource managers other
> than torque
> --------------------------------------------------------------------------------------
>
> Key: HADOOP-5441
> URL: https://issues.apache.org/jira/browse/HADOOP-5441
> Project: Hadoop Core
> Issue Type: Improvement
> Components: contrib/hod
> Affects Versions: 0.19.1
> Environment: All
> Reporter: Nate Woody
> Fix For: 0.19.1
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via
> Torque's TM interface to start remote processes) command to start processes
> on all nodes in the job. This call is provided as part of a torqueInterface
> class that is meant to abstract interactions with the torque resource
> managers (RMs). However, this is not functionality typically provided by
> other RMs, and is instead typically performed by an distributed command
> available on the HPC system, mpiexec, ssh, or site-specific scripts. The
> specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs
> somewhat difficult as it forces the implementer to choose the remote start
> method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the
> choice of remote start method is available as a configuration option in
> hodrc. This involves fairly simple changes to remove the pbsdsh command from
> the Scheduler class and addition configuration step of starting the
> appropriate remote start wrapper. The selection of the nodePool class will
> be altered to allow dynamic loading of classes, so that new interfaces people
> choose to write will not require altering HOD code. Provide remote start
> classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often
> provide mpiexec wrappers that ensure proper selection of network interfaces,
> etc). Provide interface classes to SGE and Moab, as well as updated Torque
> class.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.