As I know only Openib kernel drivers is installed by default with distribution. But the user level - libibverbs and other openib stuff is not installed by default. User need go to the package manager and explicitly select libibverb. So if user decided to install libibverbs he had reasons for it, and I think it will be ok to show this warning.

Pasha.

Jeff Squyres wrote:
On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:

I think having a parameter to turn off the warning is a great idea. So great in fact, that it already exists in the trunk and v1.2 :)! Setting the default value for the btl_base_warn_component_unused flag from 0 to 1
will have the desired effect.

Ah, ok.  I either didn't know about this flag or forgot about it.  :-)

I just tested this myself and see that there are actually *two* error messages (on a machine where I installed libibverbs, but with no OpenFabrics hardware, with OMPI 1.2.6):

% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

So the MCA param takes care of the OMPI message; I'll contact the libibverbs authors about their message.

I'm not sure I agree with setting the default to 0, however. The warning has proven extremely useful for diagnosing that IB (or less often GM or MX) isn't properly configured on a compute node due to some random error.
It's trivially easy for any packaging group to have the line

  btl_base_warn_component_unused = 0

added to $prefix/etc/openmpi-mca-params.conf during the install phase of
the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).

I think keeping the Debian guys happy is a good thing.  Giving them an
easy way to turn off silly warnings is a good thing.  Removing a known
useful warning to help them doesn't seem like a good thing.

I guess that this is what I am torn about. Yes, it's a useful message -- in some cases. But now that libibverbs is shipping in Debain and other Linuxes, the number of machines out there with verbs-capable hardware is far, far smaller than the number of machines without verbs- capable hardware. Specifically:

1. The number of cases where seeing the message by default is *not* useful is now potentially [much] larger than the number of cases where the default message is useful.

2. An out-of-the-box "mpirun a.out" will print warning messages in perfectly valid/good configurations (no verbs-capable hardware, but just happen to have libibverbs installed). This is a Big Deal.

3. Problems with HCA hardware and/or verbs stack are uncommon (nowadays). I'd be ok asking someone to enable a debug flag to get more information on configuration problems or hardware faults.

Shouldn't we be optimizing for the common case?

In short: I think it's no longer safe to assume that machines with libibverbs installed must also have verbs-capable hardware.


Reply via email to