Extending FMA to software is definitely within scope but there are a few things 
that need to happen.  The first (and most simple) step is to permit error 
telemetry to be posted at userlevel to fmd.  The second step requires 
modifications to SMF to allow services to report an error telemetry (ereport 
events) that is then sent to fmd (the fault manager daemon) and a diagnosis 
engine specifically written for that error telemetry.  The result is an 
actionable diagnosis (fault event) sent to a FMA response agent and a message 
sent to the admin.  The response agent is optional and could as you suggest 
take some action to allow the service to continue.

Now recall that the diagnosis message tells the admin to go 
http://www.sun.com/msg/<msg-id> for additional diagnosis and repair details.  
So, the third bit of infrastructure change is to permit third parties to 
putback to event registry and point diagnosis messages to their own versions of 
http://www.sun.com/msg.

The SMF changes have been requested by others but I don't know the status of 
that work.  Perhaps this is a good cross-community discussion to take up.

Cindi
--
This message posted from opensolaris.org
_______________________________________________
fm-discuss mailing list
fm-discuss@opensolaris.org

Reply via email to