Extending FMA to software is definitely within scope but there are a few things that need to happen. The first (and most simple) step is to permit error telemetry to be posted at userlevel to fmd. The second step requires modifications to SMF to allow services to report an error telemetry (ereport events) that is then sent to fmd (the fault manager daemon) and a diagnosis engine specifically written for that error telemetry. The result is an actionable diagnosis (fault event) sent to a FMA response agent and a message sent to the admin. The response agent is optional and could as you suggest take some action to allow the service to continue.
Now recall that the diagnosis message tells the admin to go http://www.sun.com/msg/<msg-id> for additional diagnosis and repair details. So, the third bit of infrastructure change is to permit third parties to putback to event registry and point diagnosis messages to their own versions of http://www.sun.com/msg. The SMF changes have been requested by others but I don't know the status of that work. Perhaps this is a good cross-community discussion to take up. Cindi -- This message posted from opensolaris.org _______________________________________________ fm-discuss mailing list fm-discuss@opensolaris.org