Hi Garret, It's nice to see someone tackling another IO subsystem.
You should separate 'faults' from 'errors'. In the FMA, a fault is defined as something that is broken (and associated with a piece of hardware) or defective (and associated with a piece of code). What you you have described below are errors. Errors are symptoms produced faults. We can use the information captured at the time the error is detected to work what is broken or defective. It's kinda like when you go to the doctor with a bunch of symptoms that you've noticed and ask for a diagnosis. You wouldn't want the doctor to just reiterate your symptoms back to you. You want the doctor to tell what's wrong with you. That's what we do with FMA, error information is captured by the error detectors and fed to a diagnosis engine who tells us what's broken. For example, a PCI parity error results in a diagnosis that tells us that a PCI card may be busted and needs to be replaced. Sorry for the diatribe but it's important to make sure we're on the same page. First thing to do is describe the different types of faulty or defective components in your subsystem. Something like: - controller - sdcard - firmware (?) - target We call these ASRUs (or sometimes resources). Now think about how each of the error symptoms below can be explained by one or more faults in your ASRU list. What algorithm would you use given each possible error or set of errors to diagnose the problem and answer the question: what's broken?. Garrett D'Amore wrote: > First a bit of background. I've developed a framework for SDcard > drivers called "sda". This supports both host drivers (e.g. "sdhost") > and target drivers (e.g. "sdcard"). Actually, "sdcard" itself is a > pseudo-nexus driver like scsa2usb... it allows "sd(7d)" to act as the > ultimate target for these kinds of memory cards. The full details are > in PSARC 2007/659 (SDcard Stack Phase I.) > > So what I'm trying to figure out is how to "enable" this stuff for FMA. > (Or, alternatively, get an appropriate waiver. That might not be as bad > as it sounds... its probably pretty unlikely that that anyone will care > too much if their SDcard goes south... just remove and reinsert in most > cases.) > > There are several classes of fault that I can imagine occurring: > > 1) errors coming from the host's parent. E.g. PCI parity errors, etc. > I think I understand the docs on how to do this. Here, I think your nexus or framework need simply call pci_ereport_post() and the generic PCI diagnosis algorithms should work out the faulty ASRU (controller). > > 2) errors that are specific to the host controller. E.g. an > over-current error, or a CRC error interrupt on the SD data pins. These errors sound hardware specific and you may need to define special diagnosis algorithms but perhaps there are certain classes of errors that can be diagnosed by a general-purpose algorithm. > > 3) errors that only the framework can tell. E.g. the card is requesting > an illegal voltage change, or the card has failed to generate a > "relative card address" properly after several attempts. Clearly it > would be nice if the framework could participate here. Absolutely. This is where the framework can detect and report errors (ereport events) and diagnose problems that are common for all components under its control w/o having to involve your consumers. Typically, what happens is you develop an error reporting interface (ala pci_fm_ereport_post()) for errors detected by the framework. You can use fm_ereport_post() (uts/common/os/fm.c) or ddi_fm_ereport_post() (uts/common/os/ddifm.c) as the underlying implementation. ddi_fm_ereport_post() is evolving whereas the interfaces in fm.c are project private. Think about the ereport classes and event payload your diagnosis software will need to work out what's wrong and design the interfaces accordingly. And just like for 2), you'll need come with the algorithms to do the diagnosis of these errors and which ASRUs (resources) are faulty. > > 4) errors that the target driver can tell. E.g. a target-specific error > in response to a block transfer. (E.g. an attempt to write a block to a > protected sector.) I think you can punt here to the common sd FMA project. So now, you need to think about how you want to deliver your diagnosis software. The algorithms can range from simple (map an error to a fault) to complex. Some errors you may want to feed through serd engines such that a certain number of errors have to occur before a fault diagnosis is issued. Other diagnoses may rely upon the occurance of a particular combination of errors. In any case, there are two ways to code your diagnosis software. The first is by writing a set of eft diagnosis rules like you see for PCI or writing a C-based diagnosis fmd plugin that subscribes to your particular error reports (ereports). If most of your diagnoses are simple 1-to-1 mappings of errors to faults, eft is proabably your best bet. On the other hand, complicated algorithms can be tricky when using an eft rules set. > > What I would like to do is have some help/guidance in figuring out how > to architect FMA for this kind of solution. I did see PCI support, but > I'm not finding any other good examples of my kind of framework with FMA > support. (Notably neither USB nor 1394 frameworks have FMA support.) > Can anyone offer specific advice or documentation to read? I've read > the published documentation that I could find, but it seemed pretty > specific to leaf-drivers, and I'm not sure how to get something liek > cases #2 and #3 handled properly. This should be as clear as mud by now. Instructions on how to develop a diagnosis plugin is described in the fmd PRM (see ttp://opensolaris.org/os/community/fm). For samples in developing ereport generation interfaces for your framework, search the OpenSolaris code for calls to fm_ereport_post(). The final thing you'll need to do is write a libtopo enumerator to tack on the SD topology (list of ASRU and resource instances controlled by the sdcard framework). The latest PRM describes libtopo and how to write an enumerator. There are also plenty of examples in the source (lib/fm/topo/modules). As far as your list of deliverables go, they will look something like: - specification of ereport events for sdcard framework for 3) - optional specification for controllers for 2) - ereport generation routine for sdcard framework for 3) - optional ereport generation routine for controller drivers for 2) - diagnosis plugin or eft rules for 3) and optionally 2) - libtopo enumerator for the sdcard topology Cindi _______________________________________________ fm-discuss mailing list fm-discuss@opensolaris.org