On May 21, 2008, at 3:38 PM, Jeff Squyres wrote:

It would be great if libibverbs could return two different error
messages
- one for "there's no IB card in this machine" and one for "there's
an IB
card here, but we can't initialize it".  I think that would make this
argument go away.  Open MPI could probably mimic that behavior by
parsing
the PCI tables, but that sounds ... painful.


Thinking about this a bit more -- I think it depends on what kind of errors you are worried about seeing. IBV does separate the discovery of devices (ibv_get_device_list) from trying to open a device (ibv_open_device). So hypothetically, we *can* distinguish between these kinds of errors already.

Do you see devices that are so broken that they don't show up in the list returned from ibv_get_device_list?

FWIW: the *only* case I'm talking about changing the default for is when ibv_get_device_list returns an empty list (meaning that according to the verbs stack, there are no devices in the host). I think that we should *always* warn for any kinds of errors that occur after that (e.g., we find a device but can't open it, we find one or more devices but no active ports, etc.).

--
Jeff Squyres
Cisco Systems

Reply via email to