I do not like the fact that add_procs is called with every proc in the
MPI_COMM_WORLD. That needs to change, so, I will not rely on the number
of procs being added being the same as the world or universe size.

-Nathan

On Thu, Jul 31, 2014 at 09:22:00AM -0600, George Bosilca wrote:
>    I definitively think you misunderstood this scope of this RFC. The
>    information that is so important to you to configure the mailbox size is
>    available to you when you need it. This information is made available by
>    the PML through the call to add_procs, which comes with all the procs in
>    the MPI_COMM_WORLD. So, ugni doesn't need anything more than it is
>    available today. [This is of course under the assumption that someone
>    clean the BTL and remove the usage of MPI_COMM_WORLD.]
> 
>    The real scope of this RFC is to move this information before that in
>    order to allow the BTLs to have access to some possible number of
>    processes between the call to btl_open and the call to btl_all_proc (in
>    other words during btl_init).
> 
>      George.
> 
>    PS: here is the patch that fixes all issues in ugni.
> 
>    On Jul 31, 2014, at 10:58 , Nathan Hjelm <hje...@lanl.gov> wrote:
> 
>    >
>    > +2^10000000
>    >
>    > This information is absolutely necessary at this point. If someone has a
>    > better solution they can provide it as an alternative RFC. Until then
>    > this is how it should be done... Otherwise we loose uGNI support on the
>    > trunk. Because we ARE NOT going to remove the mailbox size optimization.
>    >
>    > -Nathan
>    >
>    > On Wed, Jul 30, 2014 at 10:00:18PM +0000, Jeff Squyres (jsquyres) wrote:
>    >> WHAT: Should we make the job size (i.e., initial number of procs)
>    available in OPAL?
>    >>
>    >> WHY: At least 2 BTLs are using this info (*more below)
>    >>
>    >> WHERE: usnic and ugni
>    >>
>    >> TIMEOUT: there's already been some inflammatory emails about this;
>    let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>    >>
>    >> MORE DETAIL:
>    >>
>    >> This is an open question.  We *have* the information at the time that
>    the BTLs are initialized: do we allow that information to go down to OPAL?
>    >>
>    >> Ralph added this info down in OPAL in r32355, but George reverted it in
>    r32361.
>    >>
>    >> Points for: YES, WE SHOULD
>    >> +++ 2 BTLs were using it (usinc, ugni)
>    >> +++ Other RTE job-related info are already in OPAL (num local ranks,
>    local rank)
>    >>
>    >> Points for: NO, WE SHOULD NOT
>    >> --- What exactly is this number (e.g., num currently-connected procs?),
>    and when is it updated?
>    >> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>    >>
>    >> FWIW: here's how ompi_process_info.num_procs was used before the BTL
>    move down to OPAL:
>    >>
>    >> - usnic: for a minor latency optimization / sizing of a shared receive
>    buffer queue length, and for the initial size of a peer lookup hash
>    >> - ugni: to determine the size of the per-peer buffers used for
>    send/recv communication
>    >>
>    >> --
>    >> Jeff Squyres
>    >> jsquy...@cisco.com
>    >> For corporate legal information go to:
>    http://www.cisco.com/web/about/doing_business/legal/cri/
>    >>
>    >> _______________________________________________
>    >> devel mailing list
>    >> de...@open-mpi.org
>    >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>    >> Link to this post:
>    http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
>    > _______________________________________________
>    > devel mailing list
>    > de...@open-mpi.org
>    > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>    > Link to this post:
>    http://www.open-mpi.org/community/lists/devel/2014/07/15394.php
> 
>    _______________________________________________
>    devel mailing list
>    de...@open-mpi.org
>    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>    Link to this post:
>    http://www.open-mpi.org/community/lists/devel/2014/07/15399.php


Attachment: pgpo6WjkLZPnT.pgp
Description: PGP signature

Reply via email to