Fair enough - yeah, that is an issue I've been avoiding :-) On Jul 31, 2014, at 9:14 AM, Nathan Hjelm <hje...@lanl.gov> wrote:
> > This approach will work now but we need to start thinking about how we > want to support multiple simultaneous btl users. Does each user call > add_procs with a single module (or set of modules) or does each user > call btl_component_init and get their own module? If we do the latter > then it might make sense to add a max_procs argument to the > btl_component_init. Keep in mind we need to change the > btl_component_init interface anyway because the threading arguments no > longer make sense in their current form. > > -Nathan > > On Thu, Jul 31, 2014 at 09:04:09AM -0700, Ralph Castain wrote: >> Like I said, why don't we just do the following: >> >>> I'd like to suggest an alternative solution. A BTL can exploit whatever >>> data it wants, but should first test if the data is available. If the data >>> is *required*, then the BTL gracefully disqualifies itself. If the data is >>> *desirable* for optimization, then the BTL writer (if they choose) can >>> include an alternate path that doesn't do the optimization if the data >>> isn't available. >> >> Seems like this should resolve the disagreement in a way that meets >> everyone's need. It basically is an attribute approach, but not requiring >> modification of the BTL interface. >> >> >> On Jul 31, 2014, at 8:26 AM, Pritchard Jr., Howard <howa...@lanl.gov> wrote: >> >>> Hi George, >>> >>> The ompi_process_info.num_procs thing that seems to have been an object >>> of some contention yesterday. >>> >>> The ugni use of this is cloned off of the way I designed the mpich netmod. >>> Leveraging off size of the job was an easy way to scale the mailbox size. >>> >>> If I'd been asked to have the netmod work in a context like it appears we >>> may want to be eventually using BTLs - not just within ompi but for other >>> things, I'd have worked with Darius (if still in mpich world) on changing >>> the netmod initialization >>> method to allow for an optional attributes struct to be passed into the >>> init >>> method to give hints about how many connections may need to be established, >>> etc. >>> >>> For the GNI BTL - the way its currently designed - if you are only expecting >>> to use it for a limited number of connections, then you want to initialize >>> for big mailboxes (IBer's can think many large buffers posted as RX WQEs). >>> But for very large jobs, with possibly highly connected communication >>> pattern, >>> you want very small mailboxes. >>> >>> Howard >>> >>> >>> -----Original Message----- >>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George Bosilca >>> Sent: Thursday, July 31, 2014 9:09 AM >>> To: Open MPI Developers >>> Subject: Re: [OMPI devel] RFC: job size info in OPAL >>> >>> What is your definition of "global job size"? >>> >>> George. >>> >>> On Jul 31, 2014, at 11:06 , Pritchard Jr., Howard <howa...@lanl.gov> wrote: >>> >>>> Hi Folks, >>>> >>>> I think given the way we want to use the btl's in lower levels like >>>> opal, it is pretty disgusting for a btl to need to figure out on its >>>> own something like a "global job size". That's not its business. >>>> Can't we add some attributes to the component's initialization method >>>> that provides hints for how to allocate resources it needs to provide its >>>> functionality? >>>> >>>> I'll see if there's something clever that can be done in ugni for now. >>>> I can always add in a hack to probe the apps placement info file and >>>> scale the smsg blocks by number of nodes rather than number of ranks. >>>> >>>> Howard >>>> >>>> >>>> -----Original Message----- >>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan >>>> Hjelm >>>> Sent: Thursday, July 31, 2014 8:58 AM >>>> To: Open MPI Developers >>>> Subject: Re: [OMPI devel] RFC: job size info in OPAL >>>> >>>> >>>> +2^10000000 >>>> >>>> This information is absolutely necessary at this point. If someone has a >>>> better solution they can provide it as an alternative RFC. Until then this >>>> is how it should be done... Otherwise we loose uGNI support on the trunk. >>>> Because we ARE NOT going to remove the mailbox size optimization. >>>> >>>> -Nathan >>>> >>>> On Wed, Jul 30, 2014 at 10:00:18PM +0000, Jeff Squyres (jsquyres) wrote: >>>>> WHAT: Should we make the job size (i.e., initial number of procs) >>>>> available in OPAL? >>>>> >>>>> WHY: At least 2 BTLs are using this info (*more below) >>>>> >>>>> WHERE: usnic and ugni >>>>> >>>>> TIMEOUT: there's already been some inflammatory emails about this; >>>>> let's discuss next Tuesday on the teleconf: Tue, 5 Aug 2014 >>>>> >>>>> MORE DETAIL: >>>>> >>>>> This is an open question. We *have* the information at the time that the >>>>> BTLs are initialized: do we allow that information to go down to OPAL? >>>>> >>>>> Ralph added this info down in OPAL in r32355, but George reverted it in >>>>> r32361. >>>>> >>>>> Points for: YES, WE SHOULD >>>>> +++ 2 BTLs were using it (usinc, ugni) Other RTE job-related info are >>>>> +++ already in OPAL (num local ranks, local rank) >>>>> >>>>> Points for: NO, WE SHOULD NOT >>>>> --- What exactly is this number (e.g., num currently-connected procs?), >>>>> and when is it updated? >>>>> --- We need to precisely delineate what belongs in OPAL vs. >>>>> above-OPAL >>>>> >>>>> FWIW: here's how ompi_process_info.num_procs was used before the BTL move >>>>> down to OPAL: >>>>> >>>>> - usnic: for a minor latency optimization / sizing of a shared >>>>> receive buffer queue length, and for the initial size of a peer >>>>> lookup hash >>>>> - ugni: to determine the size of the per-peer buffers used for >>>>> send/recv communication >>>>> >>>>> -- >>>>> Jeff Squyres >>>>> jsquy...@cisco.com >>>>> For corporate legal information go to: >>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2014/07/15373.php >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2014/07/15395.php >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15396.php >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2014/07/15400.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2014/07/15402.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15403.php