As I know only Openib kernel drivers is installed by default with
distribution.
But the user level - libibverbs and other openib stuff is not installed
by default. User need go to the package manager and explicitly
select libibverb. So if user decided to install libibverbs he had
reasons for it, and I think it will be ok to show this warning.
Pasha.
Jeff Squyres wrote:
On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:
I think having a parameter to turn off the warning is a great idea.
So
great in fact, that it already exists in the trunk and v1.2 :)!
Setting
the default value for the btl_base_warn_component_unused flag from 0
to 1
will have the desired effect.
Ah, ok. I either didn't know about this flag or forgot about it. :-)
I just tested this myself and see that there are actually *two* error
messages (on a machine where I installed libibverbs, but with no
OpenFabrics hardware, with OMPI 1.2.6):
% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
So the MCA param takes care of the OMPI message; I'll contact the
libibverbs authors about their message.
I'm not sure I agree with setting the default to 0, however. The
warning
has proven extremely useful for diagnosing that IB (or less often GM
or
MX) isn't properly configured on a compute node due to some random
error.
It's trivially easy for any packaging group to have the line
btl_base_warn_component_unused = 0
added to $prefix/etc/openmpi-mca-params.conf during the install
phase of
the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).
I think keeping the Debian guys happy is a good thing. Giving them an
easy way to turn off silly warnings is a good thing. Removing a known
useful warning to help them doesn't seem like a good thing.
I guess that this is what I am torn about. Yes, it's a useful message
-- in some cases. But now that libibverbs is shipping in Debain and
other Linuxes, the number of machines out there with verbs-capable
hardware is far, far smaller than the number of machines without verbs-
capable hardware. Specifically:
1. The number of cases where seeing the message by default is *not*
useful is now potentially [much] larger than the number of cases where
the default message is useful.
2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed). This is a Big Deal.
3. Problems with HCA hardware and/or verbs stack are uncommon
(nowadays). I'd be ok asking someone to enable a debug flag to get
more information on configuration problems or hardware faults.
Shouldn't we be optimizing for the common case?
In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.