On Wed, Dec 12, 2007 at 01:35:33PM -0500, Jeff Squyres wrote:
> I agree with Gleb's idea.  More below.
> 
> On Dec 12, 2007, at 12:24 PM, Jon Mason wrote:
> 
> > Ok, glad I got this conversation started :)
> >
> > So, we need a slight redesign to determine the cm method (unless  
> > forced
> > via commandline arg).  This can be determined by calling all the
> > individual open routines, and having them return a priority based on
> > their ability to function.  For example, the xoob open function will
> > check the mca_btl_openib_component.num_xrc_qps for a non-zero value  
> > and
> > return the priority based on that.
> >
> > Of course, if forced then it will only call that specific open  
> > function
> > and throw any relevant errors as necessary.
> 
> 
> Close, but I'd do it slightly differently:
> 
> - open() is *only* used for creating MCA params.  It's a bad name, but  
> it's unfortunately the precedent throughout the rest of the OMPI code  
> base.  :-\ (it has roots in the ompi_info command -- ompi_info has to  
> be able to get a full list of all MCA params regardless of what  
> hardware is available on the current system)
> 
> - during the openib component startup, we should add a query()  
> function that does what you describe.  I.e., we query() each endpoint  
> and it either returns a valid priority or "I don't want to be used  
> with this endpoint."
> 
> - there should be a priority MCA param for every CPC.  Perhaps the CPC  
> base can handle this...?  I'm not sure; it may need to be down in each  
> CPC.
> 
> - the list of CPCs that want to run with each endpoint are ordered by  
> priority (ties will be arbitrarily, but deterministically, broken --  
> alphabetical?) and sent around in the modex.
> 
> - when a new connection comes up, the intersection of the CPC lists  
> for the near and far endpoints is computed and the highest priority  
> CPC is used to make the connection.  Since everyone has the same data,  
> both sides will make the same decision.
> 
> - CPC init may have to change a bit -- more than one CPC may be used  
> for a given endpoint because both the local module and the remote  
> module are involved in making the decision of which CPC is used.
> 
> After this first cut is done, we should probably also add  
> btl_openib_cpc_include and btl_openib_cpc_exclude as I described in a  
> prior mail (just like *_if_include and *_if_exclude in several BTLs)  
> to include/exclude sets of CPCs at run-time.
> 
> > If this sounds sane, then let me know and I'll start coding it up.
> 
> 
> This has actually been on my to-do list for too long; if you have the  
> cycles to do this now, it would be great...

Since I need to have it done before I can do my rdma_cm bits, I'll add
this to my queue and get started immediately.

> 
> I'll make you a bargain: if you do the stuff above, I'll add in the  
> configure/build mojo for selectively compiling the XOOB CPC or not  
> (depending on whether the underlying system has XRC library support or  
> not).  Cool?
> 
> Let's go off on a /tmp-public branch for this so we don't hose the  
> trunk...  I just made /tmp-public/openib-cpc.
> 
> -- 
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to