On May 28, 2008, at 8:02 AM, Jeff Squyres wrote:

Note that the two /sys checks may be redundant; I'm not entirely sure
how the two files relate to each other.  libibverbs will complain
about the first if it is not present; the second is used to indicate
that the kernel drivers are loaded.

I got some more feedback from Roland off-list explaining that if /sys/ class/infiniband does exist and is non-empty and /sys/class/ infiniband_verbs/abi_version does not exist, then this is definitely a case where we want to warn because it implies that config is screwed up -- RDMA devices are present but not usable.

In this case, I think the warning that libibverbs itself prints is suitable ("Fatal: couldn't read..."). So let's just eliminate that check in OMPI and go with something like the following (pretty much exactly what was proposed a while ago by Pasha :-) ):

  # If sysfs/class/infiniband does not exist, the driver was not
  # started.  Therefore: assume that the user does not want RDMA
  # hardware support -- do *not* print a warning message.
  if (! -d "$sysfsdir/class/infiniband") {
      if ($always_want_to_see_warnings)
          print "Warning: $sysfsdir/class/infiniband does not exist\n";
      return SKIP_THIS_BTL;
  }

  # If we get to this point, the drivers are loaded and therefore we
  # will assume that there is supposed to be at least one RDMA device
  # present.  Warn if we don't find any.
  $list = ibv_get_device_list();
  if (empty($list)) {
print "Warning: couldn't find any RDMA devices -- if you have no RDMA devices, stop the driver to avoid this warning message\n";
      return SKIP_THIS_BTL;
  }

  # ...continue with initialization; warnings and errors are
  # *always* displayed after this point

--
Jeff Squyres
Cisco Systems

Reply via email to