This approach will work now but we need to start thinking about how we want to support multiple simultaneous btl users. Does each user call add_procs with a single module (or set of modules) or does each user call btl_component_init and get their own module? If we do the latter then it might make sense to add a max_procs argument to the btl_component_init. Keep in mind we need to change the btl_component_init interface anyway because the threading arguments no longer make sense in their current form.
-Nathan On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote: > Like I said, why don't we just do the following: > > > I'd like to suggest an alternative solution. A BTL can exploit whatever > > data it wants, but should first test if the data is available. If the data > > is *required*, then the BTL gracefully disqualifies itself. If the data is > > *desirable* for optimization, then the BTL writer (if they choose) can > > include an alternate path that doesn't do the optimization if the data > > isn't available. > > Seems like this should resolve the disagreement in a way that meets > everyone's need. It basically is an attribute approach, but not requiring > modification of the BTL interface. > > > On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard <howa...@lanl.gov> wrote: > > > Hi George, > > > > The ompi_process_info.num_procs thing that seems to have been an object > > of some contention yesterday. > > > > The ugni use of this is cloned off of the way I designed the mpich netmod. > > Leveraging off size of the job was an easy way to scale the mailbox size. > > > > If I'd been asked to have the netmod work in a context like it appears we > > may want to be eventually using BTLs - not just within ompi but for other > > things, I'd have worked with Darius (if still in mpich world) on changing > > the netmod initialization > > method to allow for an optional attributes struct to be passed into the > > init > > method to give hints about how many connections may need to be established, > > etc. > > > > For the GNI BTL - the way its currently designed - if you are only expecting > > to use it for a limited number of connections, then you want to initialize > > for big mailboxes (IBer's can think many large buffers posted as RX WQEs). > > But for very large jobs, with possibly highly connected communication > > pattern, > > you want very small mailboxes. > > > > Howard > > > > > > -----Original Message----- > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca > > Sent: Thursday, July 31, 2014 9:09 AM > > To: Open MPI Developers > > Subject: Re: [OMPI devel] RFC: job size info in OPAL > > > > What is your definition of "global job size"? > > > > George. > > > > On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote: > > > >> Hi Folks, > >> > >> I think given the way we want to use the btl's in lower levels like > >> opal, it is pretty disgusting for a btl to need to figure out on its > >> own something like a "global job size". That's not its business. > >> Can't we add some attributes to the component's initialization method > >> that provides hints for how to allocate resources it needs to provide its > >> functionality? > >> > >> I'll see if there's something clever that can be done in ugni for now. > >> I can always add in a hack to probe the apps placement info file and > >> scale the smsg blocks by number of nodes rather than number of ranks. > >> > >> Howard > >> > >> > >> -----Original Message----- > >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan > >> Hjelm > >> Sent: Thursday, July 31, 2014 8:58 AM > >> To: Open MPI Developers > >> Subject: Re: [OMPI devel] RFC: job size info in OPAL > >> > >> > >> +2^10000000 > >> > >> This information is absolutely necessary at this point. If someone has a > >> better solution they can provide it as an alternative RFC. Until then this > >> is how it should be done... Otherwise we loose uGNI support on the trunk. > >> Because we ARE NOT going to remove the mailbox size optimization. > >> > >> -Nathan > >> > >> On Wed, Jul 30, 2014 at 10:00:18PM +0000, Jeff Squyres (jsquyres) wrote: > >>> WHAT: Should we make the job size (i.e., initial number of procs) > >>> available in OPAL? > >>> > >>> WHY: At least 2 BTLs are using this info (*more below) > >>> > >>> WHERE: usnic and ugni > >>> > >>> TIMEOUT: there's already been some inflammatory emails about this; > >>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 > >>> > >>> MORE DETAIL: > >>> > >>> This is an open question. We *have* the information at the time that the > >>> BTLs are initialized: do we allow that information to go down to OPAL? > >>> > >>> Ralph added this info down in OPAL in r32355, but George reverted it in > >>> r32361. > >>> > >>> Points for: YES, WE SHOULD > >>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are > >>> +++ already in OPAL (num local ranks, local rank) > >>> > >>> Points for: NO, WE SHOULD NOT > >>> --- What exactly is this number (e.g., num currently-connected procs?), > >>> and when is it updated? > >>> --- We need to precisely delineate what belongs in OPAL vs. > >>> above-OPAL > >>> > >>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move > >>> down to OPAL: > >>> > >>> - usnic: for a minor latency optimization / sizing of a shared > >>> receive buffer queue length, and for the initial size of a peer > >>> lookup hash > >>> - ugni: to determine the size of the per-peer buffers used for > >>> send/recv communication > >>> > >>> -- > >>> Jeff Squyres > >>> jsquy...@cisco.com > >>> For corporate legal information go to: > >>> http://www.cisco.com/web/about/doing_business/legal/cri/ > >>> > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> Link to this post: > >>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> Link to this post: > >> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2014/07/15396.php > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > > Link to this post: > > http://www.open-mpi.org/community/lists/devel/2014/07/15400.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15402.php
pgpGhltvXmgsf.pgp
Description: PGP signature