I definitively think you misunderstood this scope of this RFC. The information 
that is so important to you to configure the mailbox size is available to you 
when you need it. This information is made available by the PML through the 
call to add_procs, which comes with all the procs in the MPI_COMM_WORLD. So, 
ugni doesn’t need anything more than it is available today. [This is of course 
under the assumption that someone clean the BTL and remove the usage of 
MPI_COMM_WORLD.]

The real scope of this RFC is to move this information before that in order to 
allow the BTLs to have access to some possible number of processes between the 
call to btl_open and the call to btl_all_proc (in other words during btl_init).

  George.

PS: here is the patch that fixes all issues in ugni.

Attachment: ugni.patch
Description: Binary data

On Jul 31, 2014, at 10:58 , Nathan Hjelm <hje...@lanl.gov> wrote:

> 
> +2^10000000
> 
> This information is absolutely necessary at this point. If someone has a
> better solution they can provide it as an alternative RFC. Until then
> this is how it should be done... Otherwise we loose uGNI support on the
> trunk. Because we ARE NOT going to remove the mailbox size optimization.
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 10:00:18PM +0000, Jeff Squyres (jsquyres) wrote:
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; let's 
>> discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni)
>> +++ Other RTE job-related info are already in OPAL (num local ranks, local 
>> rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared receive 
>> buffer queue length, and for the initial size of a peer lookup hash
>> - ugni: to determine the size of the per-peer buffers used for send/recv 
>> communication
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15394.php

Reply via email to