One thing I should clarify -- the ibverbs error message from my previous mail is a red herring. libibverbs prints that message on systems where the kernel portions of the OFED stack are not installed (such as the quick-n-dirty test that I did before -- all I did was install libibverbs without the corresponding kernel stuff). I installed the whole OFED stack on a machine with no verbs-capable hardware and verified that the libibverbs message does *not* appear when the kernel bits are properly installed and running.

So we're only talking about the Open MPI warning message here. More below.



On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote:

2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed).  This is a Big Deal.

Which is easily solved with a better error message, as Pasha suggested.

I guess this is where we disagree: I don't believe that the issue is solved by making a "better" message. Specifically: this is the first case where we're saying "if you run with a valid configuration, you're going to get a warning message and you have to do something extra to turn it off."

That just seems darn weird to me, especially when other MPI's don't do the same thing. Come to think of it, I can't think of many other software packages that do that.

In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.

But here's the real problem -- with our current selection logic, a user with libibverbs but no IB cards gets an error message saying "hey, we need you to set this flag to make this error go away" (or would, per Pasha's suggestion). A user with a busted IB stack on a node (which we still saw pretty often at LANL) starts using TCP and their application runs like a
dog.

I guess it's a matter of how often you see errors in the IB stack that
cause nic initialization to fail.  The machines I tend to use still
exhibit this problem pretty often, but it's possible I just work on bad
hardware more often than is usual in the wild.

I guess this is the central issue: what *is* the common case? Which set of users should be forced to do something different?

I'm claiming that now that the Linux distros are shipping libibverbs, the number of users who have the openib BTL installed but do not have verbs-capable hardware will be *much* larger than those with verbs- capable hardware. Hence, I think the pain point should be for the smaller group (those with verbs-capable hardware): set an MCA param if you want to see the warning message.

(we can debate the default value for the BTL-wide base param later -- let's first just debate the *concept* as specific to the openib BTL)

It would be great if libibverbs could return two different error messages - one for "there's no IB card in this machine" and one for "there's an IB
card here, but we can't initialize it".  I think that would make this
argument go away. Open MPI could probably mimic that behavior by parsing
the PCI tables, but that sounds ... painful.

Yes, this capability in libiverbs would be good. Parsing the PCI tables doesn't sound like our role.

I'll ask the libibverbs authors about it...

I guess the root of my concern is that unexpected behavior with no
explanation is (in my mind) the most dangerous case and the one we should address by default. And turning this error message off is going to cause
unexpected behavior without explanation.


But more information is available, and subject to normal troubleshooting techniques. And if you're in an environment where you *do* want to use verbs-capable hardware, then setting the MCA param seems perfectly acceptable to me. IIRC, LANL sets a whole pile of MCA params in the top-level openmpi-mca-params.conf file that are specific to their environment (right?). If that's true, what's one more param?

Heck, the OMPI installed by OFED can set an MCA param in openmpi-mca- params.cof by default (which is what most verbs-capable-hardware-users utilize). That would solve the issue for 98% of the IB/iWARP users out there. Those who compile from source would need to do it manually.

I agree that this is less than perfect. My main point is that I really don't like the idea of "mpirun a.out" will result in warning messages for perfectly valid configurations.

--
Jeff Squyres
Cisco Systems

Reply via email to