Hello all!

We are constructing a distributed computing solution atop of Open Solaris 
virtualization technology which employs a monitoring facility to understand and 
assess the condition of any given node under its charge. In reviewing FMA, it 
seems reasonable and quite simple to employ the snmp trap mechanism to get the 
initial indication that something is amiss. However, at this level, we are 
uninterested in how to correct the fault, we are only concerned with the impact 
of the defect on the system so that we are able to programmatically make the 
decision as to whether to remove it from service or limit the amount of work 
allocated to it. Given the trap only contains an error code to be referenced 
via the web site, we require a way in software to be able to retrieve 
sufficient information to make such a decision.

It would seem that there are number of ways we might go about getting it. 
Ignoring the distasteful notion of screen scraping the online knowledge base, 
upon receiving a trap we could conceivably have our local agent exec fmadm 
faulty to parse its output assuming it has sufficient structure. A second 
possibility would be to utilize the contents of the SUN-FM-MIB to retrieve the 
resource status, but my read is that some of the necessary information is not 
contained in the mib. A further variation on this would be to incorporate the 
problem codes into our handler to do an internal 'knowledge base' to determine 
the appropriate course of action, though I'm unclear as to where all these are 
defined. Equally, I'm as yet unfamiliar with how fmdump and fmadm go about 
their business, but we could presumably mimic them to avoid parsing string 
output. Finally, we could install our own module and deal with the daemon 
directly. Perhaps there are other approaches as well.

Is there any theory or prescribed methodology for implementing such a facility? 
 Is such a notion contrary to the current FMA design center? Which, if any, of 
the above approaches seem appropriate? Is there a RFE lurking here and a new 
facility is required? TIA!

=Ron=
-- 
This message posted from opensolaris.org
_______________________________________________
fm-discuss mailing list
[email protected]

Reply via email to