On Jan 4, 2013, at 12:57 PM, George Bosilca <bosi...@icl.utk.edu> wrote:

> Sam,
> 
> This is a major change and would have deserved an RFC, as it impose a 
> drastic/major non-scalable change (up to now the backend file creation was 
> centralized, not in addition we exchange the data through the modex).

I guess that's subject to interpretation as to how drastic and non-scalable 
this is - but this was discussed at length at last year's developer meeting and 
in subsequent calls.

> A quick look highlight the fact that quite a lot of new modex entries have 
> appeared after this patch. On a 4 proc (2x2) we got more than 20 entries each 
> one of them up to 32 bytes (he list is attached at the end of this email).

It looks to me like an optimization is missing - I suspect because Sam was 
doing this in two phases. When completed, we should see only the local-rank=0 
proc on each node emit three keys (hoping to condense that to only two as I 
believe one may no longer be needed). All other procs will simply consume them.

> 
> Clearly this new approach is significantly less scalable compared with the 
> old one.

It shouldn't be, when completed.

> In the past we had issues adding one single integer per process, I fail to 
> understand how our standards changed so much that now few hundreds bytes per 
> process become acceptable. Moreover, what is the benefit this change provides 
> in exchange of this loss of scalability?

We need to remove the RML from the shared memory startup. In the past, we used 
the RML to pass the rendezvous info between the procs on the node. Once the 
BTLs are moved, this will no longer be possible. So the only other option for 
now is to use the modex.

As has been discussed at the meetings, future plans (once the BTLs have moved) 
will remove this info from the modex except for direct-launch cases. For cases 
where we have orteds, the orteds will open the backing files and pass the info 
down to the local procs, thus eliminating the entire rendezvous protocol. 
Hopefully, we'll get there in the not-too-distant future.



> 
>  George.
> 
> PS: The exhaustive list of new SM-related modex entries:
> [dancer01:01049] [[50563,1],0] db:hash:store: storing key 
> btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing key 
> btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],0]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing key 
> btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing key 
> btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],0]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing key 
> btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing key 
> btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing key 
> btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],1]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing key 
> btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing key 
> btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],1]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing key 
> btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],0]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],0]
> [dancer02:01720] [[50563,1],1] db:hash:store: storing pointer of key 
> btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],0]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],0]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],0]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],0]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],0]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],1]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],1]
> [dancer01:01049] [[50563,1],0] db:hash:store: storing pointer of key 
> btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],1]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],1]
> [dancer02:01721] [[50563,1],3] db:hash:store: storing pointer of key 
> btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-0-1[OPAL_STRING] for proc [[50563,1],1]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-0[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-1-1[OPAL_STRING] for proc [[50563,1],1]
> [dancer01:01050] [[50563,1],2] db:hash:store: storing pointer of key 
> btl.sm.1.9-2[OPAL_BYTE_OBJECT] for proc [[50563,1],1]
> 
> 
> On Jan 3, 2013, at 22:52 , svn-commit-mai...@open-mpi.org wrote:
> 
>> Author: samuel (Samuel K. Gutierrez)
>> Date: 2013-01-03 16:52:20 EST (Thu, 03 Jan 2013)
>> New Revision: 27739
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/27739
>> 
>> Log:
>> sm BTL initialization via modex, as discussed at last year's meeting.
>> 
>> Text files modified: 
>>  trunk/ompi/mca/btl/sm/btl_sm.c                      |   337 
>> +++++++++++++++++++++--------          
>>  trunk/ompi/mca/btl/sm/btl_sm.h                      |    60 +++++           
>>                        
>>  trunk/ompi/mca/btl/sm/btl_sm_component.c            |   444 
>> ++++++++++++++++++++++++++++++++++++++-
>>  trunk/ompi/mca/btl/sm/help-mpi-btl-sm.txt           |     6                 
>>                        
>>  trunk/ompi/mca/common/sm/common_sm.c                |    92 +++++--         
>>                        
>>  trunk/ompi/mca/common/sm/common_sm.h                |    45 +++             
>>                        
>>  trunk/ompi/mca/mpool/sm/mpool_sm.h                  |    17                 
>>                        
>>  trunk/ompi/mca/mpool/sm/mpool_sm_component.c        |   111 ++++-----       
>>                        
>>  trunk/opal/mca/shmem/mmap/shmem_mmap_module.c       |     7                 
>>                        
>>  trunk/opal/mca/shmem/posix/shmem_posix_module.c     |     9                 
>>                        
>>  trunk/opal/mca/shmem/shmem_types.h                  |    36 ++              
>>                        
>>  trunk/opal/mca/shmem/sysv/shmem_sysv_module.c       |    11                 
>>                        
>>  trunk/opal/mca/shmem/windows/shmem_windows_module.c |     7                 
>>                        
>>  13 files changed, 933 insertions(+), 249 deletions(-)
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to