On Jul 14, 2008, at 3:04 PM, Ralph H. Castain wrote:

I've been quietly following this discussion, but now feel a need to jump in here. I really must disagree with the idea of building either IBCM or RDMACM support by default. Neither of these has been proven to reliably work, or to be advantageous. Our own experiences in testing them have been
slightly negative at best. When the did work, they were slower, didn't
scale well, and unreliable.

Minor clarification: we did not test RDMACM on RoadRunner.

We only tested IBCM at scale (not RDMACM) and ran into a variety of issues -- most of which were bugs in Open MPI's use of IBCM -- that culminated in the ib_cm_listen() problem. That problem is currently unsolved, and I agree that it unfortunately currently makes OMPI's IBCM support fairly useless. Bonk.

IBCM was thought to be a nice thing: a cheap/fast way to make IB connections that would get OOB out of the picture. If the ib_cm_listen() problem is fixed, it may still be (Sean had an interesting suggestion; we'll see where it goes). But I totally agree that it is somewhat of an unknown quantity at this point. I also agree that the IBCM support in OMPI is not *necessary* because OOB works just fine (especially with the scalability improvements in v1.3).

RDMACM, on the other hand, is *necessary* for iWARP connections. We know it won't scale well because of ARP issues, to which the iWARP vendors are publishing their own solutions (pre-populating ARP caches, etc.). Even when built and installed, RDMACM will not be used by default for IB hardware (you have to specifically ask for it). Since it's necessary for iWARP, I think we need to build and install it by default. Most importantly: production IB users won't be disturbed.

I'm not trying to rain on anyone's parade. These are worthwhile in the
long term. However, they clearly need further work to be "ready for prime
time".

Accordingly, I would recommend that they -only- be built if specifically
requested. Remember, most of our users just build blindly. It makes no
sense to have them build support for what can only be classed as an
experimental capability at this time.

I defer to Mellanox for a decision about the IBCM CPC.

But for the RDMACM, per above, I am still in favor of building and installing it by default.

Also, note that the OFED install is less-than-reliable wrt IBCM and
RDMACM.

True; the OFED install is less-than-reliable w.r.t. IBCM per the previously-discussed issue of not necessarily creating the /dev/ infiniband/ucm* devices. There's a ticket open on the OpenFabrics bugzilla about it. I wish it would get fixed. :-)

But I've not seen any problems with OFED's RDMACM installation.

The only issue I've seen with RDMACM is when sites consciously chose to put the RDMACM libraries and/or modules on the head node (and therefore OMPI built support for it), but then chose not put them out on back-end compute nodes. Keep in mind that this is *not* the default OFED installation pattern -- a human has to go manually modify a file to make it do that (I don't believe that there's even a menu option for that installation mode; you have to go hand-edit an OFED installation configuration file or simply choose not to put / remove certain RPMs out on back-end nodes).

We have spent considerable time chasing down installation problems
that allowed the system to build, but then caused it to crash-and- burn if we attempted to use it. We have gained experience at knowing when/ where to look now, but that doesn't lessen the reputation impact OMPI is getting as
a "buggy, cantankerous beast" according to our sys admins.

Isn't the whole point of pre-release test versions is to find and fix such bugs? ;-)

Not a reputation we should be encouraging.

Turning this off by default allows those more adventurous souls to explore this capability, while letting our production-oriented customers install
and run in peace.


Pasha was recommending that IBCM be built by default *but not used by default*. So production users would still be able to run in peace -- OOB will still be the default. I see it pretty much like SLURM support: it's built by default, but it won't activate itself unless relevant. But like I said above, I defer to Mellanox for IBCM. :-)

Just my $0.00000000002...

--
Jeff Squyres
Cisco Systems

Reply via email to