What is your definition of “global job size”? George.
On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote: > Hi Folks, > > I think given the way we want to use the btl's in lower levels like opal, > it is pretty disgusting for a btl to need to figure out on its own something > like a "global job size". That's not its business. Can't we add some > attributes > to the component's initialization method that provides hints for how to > allocate resources it needs to provide its functionality? > > I'll see if there's something clever that can be done in ugni for now. > I can always add in a hack to probe the apps placement info file and > scale the smsg blocks by number of nodes rather than number of ranks. > > Howard > > > -----Original Message----- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan Hjelm > Sent: Thursday, July 31, 2014 8:58 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] RFC: job size info in OPAL > > > +2^10000000 > > This information is absolutely necessary at this point. If someone has a > better solution they can provide it as an alternative RFC. Until then this is > how it should be done... Otherwise we loose uGNI support on the trunk. > Because we ARE NOT going to remove the mailbox size optimization. > > -Nathan > > On Wed, Jul 30, 2014 at 10:00:18PM +0000, Jeff Squyres (jsquyres) wrote: >> WHAT: Should we make the job size (i.e., initial number of procs) available >> in OPAL? >> >> WHY: At least 2 BTLs are using this info (*more below) >> >> WHERE: usnic and ugni >> >> TIMEOUT: there's already been some inflammatory emails about this; >> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 >> >> MORE DETAIL: >> >> This is an open question. We *have* the information at the time that the >> BTLs are initialized: do we allow that information to go down to OPAL? >> >> Ralph added this info down in OPAL in r32355, but George reverted it in >> r32361. >> >> Points for: YES, WE SHOULD >> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are >> +++ already in OPAL (num local ranks, local rank) >> >> Points for: NO, WE SHOULD NOT >> --- What exactly is this number (e.g., num currently-connected procs?), and >> when is it updated? >> --- We need to precisely delineate what belongs in OPAL vs. above-OPAL >> >> FWIW: here's how ompi_process_info.num_procs was used before the BTL move >> down to OPAL: >> >> - usnic: for a minor latency optimization / sizing of a shared receive >> buffer queue length, and for the initial size of a peer lookup hash >> - ugni: to determine the size of the per-peer buffers used for >> send/recv communication >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15395.php