Mike/Cindi,

As you may recall, one of the problems with "style 2" DLPI datalinks is
that opening them bypasses the FMA I/O retire checks in spec_open() (since
the kernel doesn't know what piece of hardware is actually being accessed
until the DL_ATTACH_REQ is done).

However, the /dev/net directory introduced by the recent Clearview UV
putback consists only of "style 1" DLPI links, which the spec_open()
checks correctly catch, causing ENXIO to be returned.  Since all libdlpi
applications check /dev/net first, these style-1 links are now preferred.
>From a RAS standpoint, this is a marked improvement.  However, we've
already encountered a handful of systems with a network device that
apparently mostly worked (even though FMA had retired it) which failed to
open with ENXIO after upgrading to the UV bits.  Of course, once the user
runs "fmadm faulty", everything falls into place -- but to most, the
connection between the ENXIO error and FMA may not occur (especially since
FMA may have done the retire months ago).  I fear this will lead to
support calls and frustration.

As such, I had a few points I wanted your input on:

        1. Has there been any discussion of a new errno for this case?
           If we had a new errno, such as ERETIRED or EFAULTED, API
           consumers could differentiate this case if appropriate, and
           moreover strerror() could say something more helpful than "No
           such device or address".

        2. It seems uneven to have retired networking hardware but not
           have anything reported by dladm -- minimally, I'd think it
           appropriate for show-phys to report this, and (given the
           severity) maybe show-link as well.  (However, I don't want
           dladm to impinge on fmadm's duties.)

        3. It worries me that in all the cases we've seen thus far, the
           fault was "repaired" and never seen again.  Is this common, or
           is this indicative of bugs in our fault detection code?

If these things have been discussed in the past, pointers are welcome.

Thanks,
-- 
meem

Reply via email to