[ 
https://issues.apache.org/jira/browse/HADOOP-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate Woody updated HADOOP-5441:
-------------------------------

    Attachment: HOD_patch1

Patch to resolve issue
1) Allow dynamic loading of nodePools (other than torque)
2) Move pbsdsh functionality out of Schedulers/torque into seperate 
remote-start module
3) Expose new config-setting to allow specification of remote-start method and 
set default to pbsdsh

> HOD refactoring to ease integration with scheduler/resource managers other 
> than torque
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5441
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hod
>    Affects Versions: 0.19.1
>         Environment: All
>            Reporter: Nate Woody
>             Fix For: 0.19.1
>
>         Attachments: HOD_patch1
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Situation: HOD currently uses the pbsdsh (a distributed shell that works via 
> Torque's TM interface to start remote processes) command to start processes 
> on all nodes in the job.  This call is provided as part of a torqueInterface 
> class that is meant to abstract interactions with the torque resource 
> managers (RMs).  However, this is not functionality typically provided by 
> other RMs, and is instead typically performed by an distributed command 
> available on the HPC system, mpiexec, ssh, or site-specific scripts.  The 
> specificity of pbsdsh to Torque makes writing HOD interfaces to other RMs 
> somewhat difficult as it forces the implementer to choose the remote start 
> method on a somewhat faulty per-RM basis.
> Proposal: Refactor the torqueInterface and nodePool classes so that the 
> choice of remote start method is available as a configuration option in 
> hodrc.  This involves fairly simple changes to remove the pbsdsh command from 
> the Scheduler class and addition configuration step of starting the 
> appropriate remote start wrapper.  The selection of the nodePool class will 
> be altered to allow dynamic loading of classes, so that new interfaces people 
> choose to write will not require altering HOD code.  Provide remote start 
> classes for pbsdsh, mpiexec, ssh, as well as custom scripts (sites often 
> provide mpiexec wrappers that ensure proper selection of network interfaces, 
> etc).  Provide interface classes to SGE and Moab, as well as updated Torque 
> class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to