I agree with Gleb's idea.  More below.

On Dec 12, 2007, at 12:24 PM, Jon Mason wrote:

Ok, glad I got this conversation started :)

So, we need a slight redesign to determine the cm method (unless forced
via commandline arg).  This can be determined by calling all the
individual open routines, and having them return a priority based on
their ability to function.  For example, the xoob open function will
check the mca_btl_openib_component.num_xrc_qps for a non-zero value and
return the priority based on that.

Of course, if forced then it will only call that specific open function
and throw any relevant errors as necessary.


Close, but I'd do it slightly differently:

- open() is *only* used for creating MCA params. It's a bad name, but it's unfortunately the precedent throughout the rest of the OMPI code base. :-\ (it has roots in the ompi_info command -- ompi_info has to be able to get a full list of all MCA params regardless of what hardware is available on the current system)

- during the openib component startup, we should add a query() function that does what you describe. I.e., we query() each endpoint and it either returns a valid priority or "I don't want to be used with this endpoint."

- there should be a priority MCA param for every CPC. Perhaps the CPC base can handle this...? I'm not sure; it may need to be down in each CPC.

- the list of CPCs that want to run with each endpoint are ordered by priority (ties will be arbitrarily, but deterministically, broken -- alphabetical?) and sent around in the modex.

- when a new connection comes up, the intersection of the CPC lists for the near and far endpoints is computed and the highest priority CPC is used to make the connection. Since everyone has the same data, both sides will make the same decision.

- CPC init may have to change a bit -- more than one CPC may be used for a given endpoint because both the local module and the remote module are involved in making the decision of which CPC is used.

After this first cut is done, we should probably also add btl_openib_cpc_include and btl_openib_cpc_exclude as I described in a prior mail (just like *_if_include and *_if_exclude in several BTLs) to include/exclude sets of CPCs at run-time.

If this sounds sane, then let me know and I'll start coding it up.


This has actually been on my to-do list for too long; if you have the cycles to do this now, it would be great...

I'll make you a bargain: if you do the stuff above, I'll add in the configure/build mojo for selectively compiling the XOOB CPC or not (depending on whether the underlying system has XRC library support or not). Cool?

Let's go off on a /tmp-public branch for this so we don't hose the trunk... I just made /tmp-public/openib-cpc.

--
Jeff Squyres
Cisco Systems

Reply via email to