Re: [OMPI devel] RFC: job size info in OPAL

George Bosilca Thu, 31 Jul 2014 11:09:29 -0400 (EDT)

What is your definition of “global job size”?

  George.


On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote:

> Hi Folks,
> 
> I think given the way we want to use the btl's in lower levels like opal,
> it is pretty disgusting for a btl to need to figure out on its own something
> like a "global job size".  That's not its business.  Can't we add some 
> attributes
> to the component's initialization method that provides hints for how to
> allocate resources it needs to provide its functionality?
> 
> I'll see if there's something clever that can be done in ugni for now.
> I can always add in a hack to probe the apps placement info file and
> scale the smsg blocks by number of nodes rather than number of ranks.
> 
> Howard
> 
> 
> -----Original Message-----
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm
> Sent: Thursday, July 31, 2014 8:58 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] RFC: job size info in OPAL
> 
> 
> +2^10000000
> 
> This information is absolutely necessary at this point. If someone has a 
> better solution they can provide it as an alternative RFC. Until then this is 
> how it should be done... Otherwise we loose uGNI support on the trunk. 
> Because we ARE NOT going to remove the mailbox size optimization.
> 
> -Nathan
> 
> On Wed, Jul 30, 2014 at 10:00:18PM +0000, Jeff Squyres (jsquyres) wrote:
>> WHAT: Should we make the job size (i.e., initial number of procs) available 
>> in OPAL?
>> 
>> WHY: At least 2 BTLs are using this info (*more below)
>> 
>> WHERE: usnic and ugni
>> 
>> TIMEOUT: there's already been some inflammatory emails about this; 
>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014
>> 
>> MORE DETAIL:
>> 
>> This is an open question.  We *have* the information at the time that the 
>> BTLs are initialized: do we allow that information to go down to OPAL?
>> 
>> Ralph added this info down in OPAL in r32355, but George reverted it in 
>> r32361.
>> 
>> Points for: YES, WE SHOULD
>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are 
>> +++ already in OPAL (num local ranks, local rank)
>> 
>> Points for: NO, WE SHOULD NOT
>> --- What exactly is this number (e.g., num currently-connected procs?), and 
>> when is it updated?
>> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL
>> 
>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move 
>> down to OPAL:
>> 
>> - usnic: for a minor latency optimization / sizing of a shared receive 
>> buffer queue length, and for the initial size of a peer lookup hash
>> - ugni: to determine the size of the per-peer buffers used for 
>> send/recv communication
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php

Re: [OMPI devel] RFC: job size info in OPAL

Reply via email to