On May 21, 2008, at 3:38 PM, Jeff Squyres wrote:
It would be great if libibverbs could return two different error messages - one for "there's no IB card in this machine" and one for "there's an IB card here, but we can't initialize it". I think that would make this argument go away. Open MPI could probably mimic that behavior by parsing the PCI tables, but that sounds ... painful.
Thinking about this a bit more -- I think it depends on what kind of errors you are worried about seeing. IBV does separate the discovery of devices (ibv_get_device_list) from trying to open a device (ibv_open_device). So hypothetically, we *can* distinguish between these kinds of errors already.
Do you see devices that are so broken that they don't show up in the list returned from ibv_get_device_list?
FWIW: the *only* case I'm talking about changing the default for is when ibv_get_device_list returns an empty list (meaning that according to the verbs stack, there are no devices in the host). I think that we should *always* warn for any kinds of errors that occur after that (e.g., we find a device but can't open it, we find one or more devices but no active ports, etc.).
-- Jeff Squyres Cisco Systems