On May 21, 2008, at 11:14 AM, Brian W. Barrett wrote:
I think having a parameter to turn off the warning is a great idea.
So
great in fact, that it already exists in the trunk and v1.2 :)!
Setting
the default value for the btl_base_warn_component_unused flag from 0
to 1
will have the desired effect.
Ah, ok. I either didn't know about this flag or forgot about it. :-)
I just tested this myself and see that there are actually *two* error
messages (on a machine where I installed libibverbs, but with no
OpenFabrics hardware, with OMPI 1.2.6):
% mpirun -np 1 hello
libibverbs: Fatal: couldn't read uverbs ABI version.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host eddie.osl.iu.edu was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
So the MCA param takes care of the OMPI message; I'll contact the
libibverbs authors about their message.
I'm not sure I agree with setting the default to 0, however. The
warning
has proven extremely useful for diagnosing that IB (or less often GM
or
MX) isn't properly configured on a compute node due to some random
error.
It's trivially easy for any packaging group to have the line
btl_base_warn_component_unused = 0
added to $prefix/etc/openmpi-mca-params.conf during the install
phase of
the package build (indeed, our simple build scripts at LANL used to do
this on a regular bases due to our need to tweek the OOB to keep IPoIB
happier at scale).
I think keeping the Debian guys happy is a good thing. Giving them an
easy way to turn off silly warnings is a good thing. Removing a known
useful warning to help them doesn't seem like a good thing.
I guess that this is what I am torn about. Yes, it's a useful message
-- in some cases. But now that libibverbs is shipping in Debain and
other Linuxes, the number of machines out there with verbs-capable
hardware is far, far smaller than the number of machines without verbs-
capable hardware. Specifically:
1. The number of cases where seeing the message by default is *not*
useful is now potentially [much] larger than the number of cases where
the default message is useful.
2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed). This is a Big Deal.
3. Problems with HCA hardware and/or verbs stack are uncommon
(nowadays). I'd be ok asking someone to enable a debug flag to get
more information on configuration problems or hardware faults.
Shouldn't we be optimizing for the common case?
In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.
--
Jeff Squyres
Cisco Systems