It isn't necessary. The orted already will open and use the local plm if you 
simply set OMPI_MCA_plm=foo in its environment. The rsh, tm, and slurm plm 
modules already do this so that they can execute a tree-like spawn (for rsh) 
and because I needed ssh on the backend nodes to locally launch "slaves" on 
RoadRunner and other machines.

The required code (already in those modules) is:

    /* enable local launch by the orteds */
    var = mca_base_param_environ_variable("plm", NULL, NULL);
    opal_setenv(var, "rsh", true, &env);
    free(var);


You don't want the orted using the hnp ess module as it will then try to track 
its own launches and totally forget that it is a remote orted with slightly 
different responsibilities.

If you need it to execute a different plm on the backend, please let me know - 
it is a trivial change to allow specification of remote launch agents, and we 
should do it for them all if we do.

Ralph

On Dec 18, 2009, at 7:43 AM, Damien Guinier wrote:

> Hi Ralph
> 
> On Openmpi, I working on a new little feature: hnp_always_use_plm.
> - To create final application , mpirun use on remote "orted via plm: Process 
> lifecycle managment module" or localy "fork()". So the first compute node 
> haven't the same methode than other compute node. Some debug tools(padb ...) 
> and management tools (squeus -s ...) are impacted by this difference.
> To simplify this cluster tools usage, I propose to add the possibility to use 
> "orted via plm" on remote and localy.
> 
> I make a patch to add the parameter "OMPI_MCA_ess_hnp_always_use_plm", to use 
> the "plm" module everywhere. On my patch , by default nothing is changed ( no 
> impact).
> 
> Can you say to me , if this feature( and the patch) is good ?
> 
> thank you
> 
> Damien
> 
> diff orte/mca/ess/hnp/ess_hnp.h
> --- a/orte/mca/ess/hnp/ess_hnp.h        Tue Dec 15 15:31:24 2009 +0100
> +++ b/orte/mca/ess/hnp/ess_hnp.h        Tue Dec 15 18:19:18 2009 +0100
> @@ -27,7 +27,7 @@
> int orte_ess_hnp_component_open(void);
> int orte_ess_hnp_component_close(void);
> int orte_ess_hnp_component_query(mca_base_module_t **module, int *priority);
> -
> +extern int mca_ess_hnp_always_use_plm;
> 
> ORTE_MODULE_DECLSPEC extern orte_ess_base_component_t mca_ess_hnp_component;
> 
> diff orte/mca/ess/hnp/ess_hnp_component.c
> --- a/orte/mca/ess/hnp/ess_hnp_component.c      Tue Dec 15 15:31:24 2009 +0100
> +++ b/orte/mca/ess/hnp/ess_hnp_component.c      Tue Dec 15 18:19:18 2009 +0100
> @@ -33,6 +33,7 @@
> #include "orte/mca/ess/hnp/ess_hnp.h"
> 
> extern orte_ess_base_module_t orte_ess_hnp_module;
> +int mca_ess_hnp_always_use_plm = 0;
> 
> /*
> * Instantiate the public struct with all of our public information
> @@ -63,6 +64,10 @@
> int
> orte_ess_hnp_component_open(void)
> {
> +               mca_base_param_reg_int(&mca_ess_hnp_component.base_version,
> +                "always_use_plm",
> +                "Used to force plm on all machine",
> +                false,false, mca_ess_hnp_always_use_plm 
> ,&mca_ess_hnp_always_use_plm);
>    return ORTE_SUCCESS;
> }
> 
> diff orte/mca/ess/hnp/ess_hnp_module.c
> --- a/orte/mca/ess/hnp/ess_hnp_module.c Tue Dec 15 15:31:24 2009 +0100
> +++ b/orte/mca/ess/hnp/ess_hnp_module.c Tue Dec 15 18:19:18 2009 +0100
> @@ -442,9 +442,12 @@
>     * node object
>     */
>    OBJ_RETAIN(proc);   /* keep accounting straight */
> +    if(mca_ess_hnp_always_use_plm==0)
> +    {
>    node->daemon = proc;
>    node->daemon_launched = true;
>    node->state = ORTE_NODE_STATE_UP;
> +    }
> 
>    /* record that the daemon job is running */
>    jdata->num_procs = 1;
> 

Reply via email to