Hi Ralph,

My answer has 2 parts:

1. I'm not familiar with bproc but I assume that when working with bproc
there is a component that reads the RMAPS information somehow and
launches the local process. In that case you can add the affinity there
according the slot_list (new member in the map) from the RMAP. In any
case you mast add the mapping form the user map to the actual cpu_set
bitmap, on the end node because the head node haven't the information of
the internal structure in each and every end node in the grid.

2. My changes did not changed anything in the way that orte works today
it just added some functionality. You don't have to use this new
functionality you can still work as you work today. 


Sharon.

-----Original Message-----
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Ralph H Castain
Sent: Tuesday, July 10, 2007 6:31 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] ticket 1023

Ah, I see the problem. I think you have misunderstood the ODLS
framework.

The ODLS is the "Orte Daemon Launch Subsystem" and is used by the orted
to
launch the local procs. Mpirun also accesses the ODLS, but only to
construct
the add_procs message that gets sent to the daemons.

The problem is, therefore, that systems which do not use the orteds to
actually launch the backend processes will not have access to the ODLS
on
the backend machines. Instead, they use their own internal mechanism for
launching the remote processes. Bproc is an example of this mode of
operation.

So if the mapping is in the ODLS component, then systems that do not use
the
orted will not be able to map rank to processor. Does this mean they
cannot
set affinity?

For example, this change appears to break bproc's ability to do affinity
since bproc launches the local procs outside of the orteds - is this
true,
or can I set affinity without going through the ODLS? That would be an
issue
for LANL, I believe.

Thanks
Ralph



On 7/10/07 9:18 AM, "Sharon Melamed" <shar...@voltaire.com> wrote:

> Hi Ralph,
> 
> The responsibility for mapping rank to processor is in the ODLS
> component.
> I didn't touch the orted code.
> 
> If you doesn't use orted - you steel use the ODLS component (like ODLS
> bproc). Any way you mast have a component in the end machine that
builds
> the orte_odls_child_t structure from the RMAPS information and launch
> the local processes. Currently this component is the ODLS. Most of my
> work is in the ODLS component so if you decide to eliminate the orteds
> you mast, somehow, preserve the ODLS functionality.
> 
> Sharon.
> 
>   
> 
> -----Original Message-----
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
On
> Behalf Of Ralph H Castain
> Sent: Tuesday, July 10, 2007 4:43 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] ticket 1023
> 
> As I understood our original discussions, this would move
responsibility
> for
> mapping rank to processor back into the orted - is that still true?
> 
> Reason I ask is to again clarify for people if we are doing so as it
(a)
> impacts those systems that don't use our orteds (e.g., will affinity
> still
> work in those environments?); and (b) it will make elimination of the
> orteds
> just a little more difficult.
> 
> So could you please clarify for everyone what this code functionally
> does?
> All 1023 does is layout syntax - it doesn't clearly state what happens
> where.
> 
> Thanks
> Ralph
> 
> 
> 
> On 7/10/07 7:32 AM, "Sharon Melamed" <shar...@voltaire.com> wrote:
> 
>> Hello All,
>> 
>>  
>> In the recent few weeks I implemented ticket 1023
>> (https://svn.open-mpi.org/trac/ompi/ticket/1023
>> <https://svn.open-mpi.org/trac/ompi/ticket/1023> ).
>> 
>> In a few words, the purpose of ticket 1023 is to expand the hostfile
> syntax to
>> precisely specify slot
>> location (in terms of virtual CPU ID or socket core notation) in the
> node
>> and/or rank in a MCW.
>> 
>>  
>> 
>> The code is in a temporary branch
>> https://svn.open-mpi.org/svn/ompi/tmp/sharon/
>> 
>> The changes are:
>> 
>> 1. In the RAS base component:
>>    a. Added new list of orte_ras_proc_t structures
>>    b. Each orte_ras_proc_t structure contains 3 members: node_name,
> rank and
>> cpu_list.
>>    c. the cpu_list is a string representing the slot list from the
> hostfile
>> i.e.: if the 
>>       SLOT token in the hostfile is - SLOT=1@2:1,3:1-4, the slot_list
> string
>> is: 2:1,3:7-9.
>>  
>> 2. In the RDS hostfile component:
>>    a. Added new token SLOT to the lex parser.
>>    b. filling the orte_ras_proc_t structure list according the SLOT
> token in
>> the hostfile.
>>  
>> 3. In the RMAPS round robin component:
>>    a. Added new member to orte_mapped_node_t structure - slot_list
> (similar to
>> the slot_list 
>>       in the orte_ras_proc_t structure)
>>    b. in the orte_rmaps_rr_map, mapping job according to hostfile
> ranks before
>> mapping the job
>>       by slot or by node.
>>    c. in the orte_rmaps_rr_map, arranging the MCW ranks according to
> the
>> hostfile.
>>  
>> 4. in the ODLS default module:
>>    a. Added slot_list to orte_odls_default_get_add_procs_data.
>>    b. Added slot_list to orte_odls_default_launch_local_procs.
>>    c. Added new member to the child structure a cpu_set bitmap (for
> PLPA)
>>    d. Added mapping of the slot_list string to a cpu_set bitmap in
the
> child
>> structure.  
>>  
>> For more details you can browse the code.
>>  
>> I would like to merge these changes to the trunk as soon as possible
> since, as
>> I understood from Ralph Castain emails,
>> The Open RTE will go through a lot of changes in the near future and
> since
>> this is a relatively small change I want to merge
>> it before the big change.
>>  
>> Any comments?
>>  
>> Sharon.
>>  
>>    
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to