Dear Folks, I am writing to welcome clues about providing an itemised list of outages and their causes from, 'in some way', Nagios.
The Nagios availability report does ineed provide a useful list of outages that can be wrapped and processed to ones hearts content (eg HOST_NAME DOWN UP OUTAGE Albany_DEST_router 05-12-2005 04:10:59 05-12-2005 08:42:29 4h 31m 30s Albany_Optus_router_PE_in 05-12-2005 04:10:59 05-12-2005 08:42:29 4h 31m 30s Lismore_DEST_router 05-12-2005 16:11:30 05-12-2005 20:01:40 3h 50m 10s Lismore_Optus_router_PE_i 05-12-2005 16:11:30 05-12-2005 20:01:40 3h 50m 10s Kempsey_DEST_router 05-12-2005 13:16:39 05-12-2005 13:22:49 6m 10s Kempsey_Optus_router_PE_i 05-12-2005 13:16:39 05-12-2005 13:22:49 6m 10s Broken_Hill_Optus_router_ 05-12-2005 01:54:17 05-12-2005 01:57:27 3m 10s Broken_Hill_DEST_router 05-12-2005 01:56:07 05-12-2005 01:57:27 1m 20s ) but Nagios has AFAIK, no means of capuring event related data and associating it with an outage event to produce something like HOST_NAME DOWN UP OUTAGE CAUSE COMMENT Albany_DEST_router 05-12-2005 04:10:59 05-12-2005 08:42:29 4h 31m 30s 1 BDR -> down, provider Albany_Optus_router_PE_in 05-12-2005 04:10:59 05-12-2005 08:42:29 4h 31m 30s 1 BDR -> down, provider Lismore_DEST_router 05-12-2005 16:11:30 05-12-2005 20:01:40 3h 50m 10s 2 router restart by power-on Lismore_Optus_router_PE_i 05-12-2005 16:11:30 05-12-2005 20:01:40 3h 50m 10s 2 power failure Kempsey_DEST_router 05-12-2005 13:16:39 05-12-2005 13:22:49 6m 10s 1 BDR -> down, provider Kempsey_Optus_router_PE_i 05-12-2005 13:16:39 05-12-2005 13:22:49 6m 10s 1 BDR -> down, provider Broken_Hill_Optus_router_ 05-12-2005 01:54:17 05-12-2005 01:57:27 3m 10s 5 dismiss Broken_Hill_DEST_router 05-12-2005 01:56:07 05-12-2005 01:57:27 1m 20s 5 dismiss In this case, cause is a coded value that classifies the fault and the comment is free form text. The best I can think of to create something like this is to 1 Append the outages to a file - possibly by having an event handler run the code that extracts the outage from the availability CGI - or better still all the data for an outage is prob provided by macros - for the host or service and appending that to a file. 2 Have an admin edit the file and add the values when they become known. The guts of the problem is Nagios does the right thing by automatically changing the state of monitored entity; there is no opportuntity to 'officially' close the 'fault' by collecting user-input and associating it with an outage. Looked at another way, outages don't really exist as first class objects (with their own methods and data). All comments are very welcome, Yours sincerely. ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null