Then we disagree on a core point. I believe that users should never have something silently unexpected happen (like falling back to TCP from a high speed interconnect because of a NIC reset / software issue). YOu clearly don't feel this way. I don't really work on the project, but do have lots of experience being yelled at by users when something unexpected happens.

I guarantee you we'll see a report of poor IB / application performance because of the silent fallback to TCP. There's a reason that error message was put in. I don't get a vote anymore, so do whatever you think is best.

Brian


On Wed, 21 May 2008, Jeff Squyres wrote:

One thing I should clarify -- the ibverbs error message from my
previous mail is a red herring.  libibverbs prints that message on
systems where the kernel portions of the OFED stack are not installed
(such as the quick-n-dirty test that I did before -- all I did was
install libibverbs without the corresponding kernel stuff).  I
installed the whole OFED stack on a machine with no verbs-capable
hardware and verified that the libibverbs message does *not* appear
when the kernel bits are properly installed and running.

So we're only talking about the Open MPI warning message here.  More
below.



On May 21, 2008, at 12:17 PM, Brian W. Barrett wrote:

2. An out-of-the-box "mpirun a.out" will print warning messages in
perfectly valid/good configurations (no verbs-capable hardware, but
just happen to have libibverbs installed).  This is a Big Deal.

Which is easily solved with a better error message, as Pasha
suggested.

I guess this is where we disagree: I don't believe that the issue is
solved by making a "better" message.  Specifically: this is the first
case where we're saying "if you run with a valid configuration, you're
going to get a warning message and you have to do something extra to
turn it off."

That just seems darn weird to me, especially when other MPI's don't do
the same thing.  Come to think of it, I can't think of many other
software packages that do that.

In short: I think it's no longer safe to assume that machines with
libibverbs installed must also have verbs-capable hardware.

But here's the real problem -- with our current selection logic, a
user
with libibverbs but no IB cards gets an error message saying "hey,
we need
you to set this flag to make this error go away" (or would, per
Pasha's
suggestion).  A user with a busted IB stack on a node (which we
still saw
pretty often at LANL) starts using TCP and their application runs
like a
dog.

I guess it's a matter of how often you see errors in the IB stack that
cause nic initialization to fail.  The machines I tend to use still
exhibit this problem pretty often, but it's possible I just work on
bad
hardware more often than is usual in the wild.

I guess this is the central issue: what *is* the common case?  Which
set of users should be forced to do something different?

I'm claiming that now that the Linux distros are shipping libibverbs,
the number of users who have the openib BTL installed but do not have
verbs-capable hardware will be *much* larger than those with verbs-
capable hardware.  Hence, I think the pain point should be for the
smaller group (those with verbs-capable hardware): set an MCA param if
you want to see the warning message.

(we can debate the default value for the BTL-wide base param later --
let's first just debate the *concept* as specific to the openib BTL)

It would be great if libibverbs could return two different error
messages
- one for "there's no IB card in this machine" and one for "there's
an IB
card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by
parsing
the PCI tables, but that sounds ... painful.

Yes, this capability in libiverbs would be good.  Parsing the PCI
tables doesn't sound like our role.

I'll ask the libibverbs authors about it...

I guess the root of my concern is that unexpected behavior with no
explanation is (in my mind) the most dangerous case and the one we
should
address by default.  And turning this error message off is going to
cause
unexpected behavior without explanation.


But more information is available, and subject to normal
troubleshooting techniques.  And if you're in an environment where you
*do* want to use verbs-capable hardware, then setting the MCA param
seems perfectly acceptable to me.  IIRC, LANL sets a whole pile of MCA
params in the top-level openmpi-mca-params.conf file that are specific
to their environment (right?).  If that's true, what's one more param?

Heck, the OMPI installed by OFED can set an MCA param in openmpi-mca-
params.cof by default (which is what most verbs-capable-hardware-users
utilize).  That would solve the issue for 98% of the IB/iWARP users
out there.  Those who compile from source would need to do it manually.

I agree that this is less than perfect.  My main point is that I
really don't like the idea of "mpirun a.out" will result in warning
messages for perfectly valid configurations.


Reply via email to