On Jul 14, 2008, at 3:04 PM, Ralph H. Castain wrote:
I've been quietly following this discussion, but now feel a need to
jump
in here. I really must disagree with the idea of building either
IBCM or
RDMACM support by default. Neither of these has been proven to
reliably
work, or to be advantageous. Our own experiences in testing them
have been
slightly negative at best. When the did work, they were slower, didn't
scale well, and unreliable.
Minor clarification: we did not test RDMACM on RoadRunner.
We only tested IBCM at scale (not RDMACM) and ran into a variety of
issues -- most of which were bugs in Open MPI's use of IBCM -- that
culminated in the ib_cm_listen() problem. That problem is currently
unsolved, and I agree that it unfortunately currently makes OMPI's
IBCM support fairly useless. Bonk.
IBCM was thought to be a nice thing: a cheap/fast way to make IB
connections that would get OOB out of the picture. If the
ib_cm_listen() problem is fixed, it may still be (Sean had an
interesting suggestion; we'll see where it goes). But I totally agree
that it is somewhat of an unknown quantity at this point. I also
agree that the IBCM support in OMPI is not *necessary* because OOB
works just fine (especially with the scalability improvements in v1.3).
RDMACM, on the other hand, is *necessary* for iWARP connections. We
know it won't scale well because of ARP issues, to which the iWARP
vendors are publishing their own solutions (pre-populating ARP caches,
etc.). Even when built and installed, RDMACM will not be used by
default for IB hardware (you have to specifically ask for it). Since
it's necessary for iWARP, I think we need to build and install it by
default. Most importantly: production IB users won't be disturbed.
I'm not trying to rain on anyone's parade. These are worthwhile in the
long term. However, they clearly need further work to be "ready for
prime
time".
Accordingly, I would recommend that they -only- be built if
specifically
requested. Remember, most of our users just build blindly. It makes no
sense to have them build support for what can only be classed as an
experimental capability at this time.
I defer to Mellanox for a decision about the IBCM CPC.
But for the RDMACM, per above, I am still in favor of building and
installing it by default.
Also, note that the OFED install is less-than-reliable wrt IBCM and
RDMACM.
True; the OFED install is less-than-reliable w.r.t. IBCM per the
previously-discussed issue of not necessarily creating the /dev/
infiniband/ucm* devices. There's a ticket open on the OpenFabrics
bugzilla about it. I wish it would get fixed. :-)
But I've not seen any problems with OFED's RDMACM installation.
The only issue I've seen with RDMACM is when sites consciously chose
to put the RDMACM libraries and/or modules on the head node (and
therefore OMPI built support for it), but then chose not put them out
on back-end compute nodes. Keep in mind that this is *not* the
default OFED installation pattern -- a human has to go manually modify
a file to make it do that (I don't believe that there's even a menu
option for that installation mode; you have to go hand-edit an OFED
installation configuration file or simply choose not to put / remove
certain RPMs out on back-end nodes).
We have spent considerable time chasing down installation problems
that allowed the system to build, but then caused it to crash-and-
burn if
we attempted to use it. We have gained experience at knowing when/
where to
look now, but that doesn't lessen the reputation impact OMPI is
getting as
a "buggy, cantankerous beast" according to our sys admins.
Isn't the whole point of pre-release test versions is to find and fix
such bugs? ;-)
Not a reputation we should be encouraging.
Turning this off by default allows those more adventurous souls to
explore
this capability, while letting our production-oriented customers
install
and run in peace.
Pasha was recommending that IBCM be built by default *but not used by
default*. So production users would still be able to run in peace --
OOB will still be the default. I see it pretty much like SLURM
support: it's built by default, but it won't activate itself unless
relevant. But like I said above, I defer to Mellanox for IBCM. :-)
Just my $0.00000000002...
--
Jeff Squyres
Cisco Systems