Yikes - you most definitely do -not- want to do that. It -might- work in some 
circumstances, but it will lead to considerable confusion in the system in most 
cases.

We have had that accidentally happen to users - took a long time to debug what 
was going on. Definitely don't want to go back to that day!


On Dec 18, 2009, at 9:10 AM, Damien Guinier wrote:

> Sorry , I am not clear. That true, the same PLM is use on all node.
> my parameter name is incorrect , "mpirun_not_as_orted" is better. My problem 
> is simple:
> -   I want "mpirun" haven't the "orted" launch feature.
> -   To create processes on the "mpirun node", I want launch by "plm" an 
> "orted" ( on this "mpirun node"), and ask orted, to create processes.
> 
> With this way, process tracker, debug tools have no difference between nodes.
> 
> Sorry for this confusion.
> Damien
> 
> Ralph Castain a écrit :
>> It isn't necessary. The orted already will open and use the local plm if you 
>> simply set OMPI_MCA_plm=foo in its environment. The rsh, tm, and slurm plm 
>> modules already do this so that they can execute a tree-like spawn (for rsh) 
>> and because I needed ssh on the backend nodes to locally launch "slaves" on 
>> RoadRunner and other machines.
>> 
>> The required code (already in those modules) is:
>> 
>>    /* enable local launch by the orteds */
>>    var = mca_base_param_environ_variable("plm", NULL, NULL);
>>    opal_setenv(var, "rsh", true, &env);
>>    free(var);
>> 
>> 
>> You don't want the orted using the hnp ess module as it will then try to 
>> track its own launches and totally forget that it is a remote orted with 
>> slightly different responsibilities.
>> 
>> If you need it to execute a different plm on the backend, please let me know 
>> - it is a trivial change to allow specification of remote launch agents, and 
>> we should do it for them all if we do.
>> 
>> Ralph
>> 
>> On Dec 18, 2009, at 7:43 AM, Damien Guinier wrote:
>> 
>>> Hi Ralph
>>> 
>>> On Openmpi, I working on a new little feature: hnp_always_use_plm.
>>> - To create final application , mpirun use on remote "orted via plm: 
>>> Process lifecycle managment module" or localy "fork()". So the first 
>>> compute node haven't the same methode than other compute node. Some debug 
>>> tools(padb ...) and management tools (squeus -s ...) are impacted by this 
>>> difference.
>>> To simplify this cluster tools usage, I propose to add the possibility to 
>>> use "orted via plm" on remote and localy.
>>> 
>>> I make a patch to add the parameter "OMPI_MCA_ess_hnp_always_use_plm", to 
>>> use the "plm" module everywhere. On my patch , by default nothing is 
>>> changed ( no impact).
>>> 
>>> Can you say to me , if this feature( and the patch) is good ?
>>> 
>>> thank you
>>> 
>>> Damien
>>> 
>>> diff orte/mca/ess/hnp/ess_hnp.h
>>> --- a/orte/mca/ess/hnp/ess_hnp.h        Tue Dec 15 15:31:24 2009 +0100
>>> +++ b/orte/mca/ess/hnp/ess_hnp.h        Tue Dec 15 18:19:18 2009 +0100
>>> @@ -27,7 +27,7 @@
>>> int orte_ess_hnp_component_open(void);
>>> int orte_ess_hnp_component_close(void);
>>> int orte_ess_hnp_component_query(mca_base_module_t **module, int *priority);
>>> -
>>> +extern int mca_ess_hnp_always_use_plm;
>>> 
>>> ORTE_MODULE_DECLSPEC extern orte_ess_base_component_t mca_ess_hnp_component;
>>> 
>>> diff orte/mca/ess/hnp/ess_hnp_component.c
>>> --- a/orte/mca/ess/hnp/ess_hnp_component.c      Tue Dec 15 15:31:24 2009 
>>> +0100
>>> +++ b/orte/mca/ess/hnp/ess_hnp_component.c      Tue Dec 15 18:19:18 2009 
>>> +0100
>>> @@ -33,6 +33,7 @@
>>> #include "orte/mca/ess/hnp/ess_hnp.h"
>>> 
>>> extern orte_ess_base_module_t orte_ess_hnp_module;
>>> +int mca_ess_hnp_always_use_plm = 0;
>>> 
>>> /*
>>> * Instantiate the public struct with all of our public information
>>> @@ -63,6 +64,10 @@
>>> int
>>> orte_ess_hnp_component_open(void)
>>> {
>>> +               mca_base_param_reg_int(&mca_ess_hnp_component.base_version,
>>> +                "always_use_plm",
>>> +                "Used to force plm on all machine",
>>> +                false,false, mca_ess_hnp_always_use_plm 
>>> ,&mca_ess_hnp_always_use_plm);
>>>   return ORTE_SUCCESS;
>>> }
>>> 
>>> diff orte/mca/ess/hnp/ess_hnp_module.c
>>> --- a/orte/mca/ess/hnp/ess_hnp_module.c Tue Dec 15 15:31:24 2009 +0100
>>> +++ b/orte/mca/ess/hnp/ess_hnp_module.c Tue Dec 15 18:19:18 2009 +0100
>>> @@ -442,9 +442,12 @@
>>>    * node object
>>>    */
>>>   OBJ_RETAIN(proc);   /* keep accounting straight */
>>> +    if(mca_ess_hnp_always_use_plm==0)
>>> +    {
>>>   node->daemon = proc;
>>>   node->daemon_launched = true;
>>>   node->state = ORTE_NODE_STATE_UP;
>>> +    }
>>> 
>>>   /* record that the daemon job is running */
>>>   jdata->num_procs = 1;
>>> 
>> 
> 


Reply via email to