hi all, few weeks ago I posted a question to this list about passive service checks - I was actually experimenting with Nagios as an event log monitoring GUI. I am tracking event logs with SEC and also sending out alerts with it, but I would still like to see correlated log messages in Nagios web interface as well.
During the experimentation, I created a volatile service definition for a host group of Linux servers which looks similar to the example in Nagios documentation: http://nagios.sourceforge.net/docs/2_0/int-snmptrap.html I have also host checks enabled for the Linux host group, since I'd like to exploit the Nagios capability of suppressing service alerts when the host is down (I have also a number of active service checks enabled for these hosts like web server monitoring). However, when a lot of correlated log messages are written to Nagios command pipe with a CRITICAL severity in a short time period, a host check is run for each such message that creates a lag between reading and displaying a message (the lag could be several minutes long for the last message). I could use several tricks to avoid this: 1) disable host checks altogether (i.e., remove 'check_command' from host definitions) 2) create a dummy host without 'check_command' that would have a special service (e.g. LogMessages) for displaying log messages from all servers Still, is there a way to have the LogMessages service associated with each host, and also have host checks enabled? In other words, can I prevent Nagios from running a host check when a certain service goes to non-OK state? If someone has other clever ideas for setting up log monitoring in Nagios, please be so kind and comment :) br, risto Marc Powell wrote: > >> -----Original Message----- >> From: [EMAIL PROTECTED] [mailto:nagios-users- >> [EMAIL PROTECTED] On Behalf Of Risto Vaarandi >> Sent: Friday, August 10, 2007 6:43 AM >> To: nagios-users@lists.sourceforge.net >> Subject: [Nagios-users] passive service checks with 1 second interval >> > > >> However, then the service goes to a critical state: >> >> [1186719373] EXTERNAL COMMAND: >> PROCESS_SERVICE_CHECK_RESULT;node03;NodeState;2;node03 DOWN at > 1186719373 >> and starting from this moment, external checks are read from command >> file with 9-10 second intervals, with a "service alert" and > notification >> at the end of each activity burst: > > This is probably a result of your host check. When a service initially > returns a non-ok state, nagios stops everything to perform the host > check, up to max_check_attempts for that host. Once that is complete, > nagios will start performing other tasks again. You'll most like want to > remove your host's check_command entirely. > >> Then the service goes up, and the after a while I am seeing the >> following log entries: >> >> [1186719447] EXTERNAL COMMAND: >> PROCESS_SERVICE_CHECK_RESULT;node03;NodeState;node03 up at 1186719447 >> [1186719447] Warning: The results of service 'NodeState' on host >> 'node03' are stale by 11 seconds (threshold=60 seconds). I'm forcing > an >> immediate check of the service. > > I don't know about this one. > >> Is there a way to speed up the processing of CRITICAL service checks? >> I'd like to get a notification within the same second. > > I won't say it's not possible but it feels very aggressive to me based > on my experience. I know there are/were others on the list trying to > monitor at or close to that resolution but I don't know how successful > they've been. Perhaps they'll chime in if they're still around. > > -- > Marc > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null