Marc Powell a écrit:
On Feb 23, 2009, at 9:43 AM, Sergio Ariel wrote:

My problem is that when these host are DOWN, Nagios wait 30 seconds
trying to execute the service check. After this 30 seconds, then tell me "CRITICAL SERVICE". I want to avoid Nagios checks any service in DOWN
Nagios isn't designed to do this; you'll need to jump through hoops to accomplish it. At the least you need to be running nagios-3 with active host checks configured. I'd suggest you look at creating an event handler for your hosts that issues the external command 'DISABLE_HOST_SVC_CHECKS' when the host is non-OK and issues the external command 'ENABLE_HOST_SVC_CHECKS' when the host recovers.

Hi,

I already posted the same problem some months ago.

I tried Mark's workaround using event handlers and external commands. I also tried another smart workaround using service dependancy. You create a 'check_host_alive' explicit service, then you create a service dependancy, so that all your other services are not checked if the main service fails. Using wildcards can be helpful. Search my name 'OTTAVI' in the list history, and you'll find more details about these two workarounds.

Anyway, both of these workarounds do not completely solve the problem. If some service checks are scheduled BEFORE the event handler triggers, or before the dependancy operates, then these service checks will return 'FAILED' status. Some optimization can be done by reducing check interval for the 'parent' check_alive service, but you will still get some 'FAILED' status for some checks that are scheduled before...

I'm having this trouble for months, but I didn't find any suitable solution.

Nagios has a 'parent/child' relationship system, which could be helpful in such a situation. But it works only for hosts. There are no parent/child relationship between services, or between hosts and services, which could solve our problem completely ! Let's hope the developpers will take our problem into consideration for future versions.

Another idea of a better workaround would be using an event handler, not to only to disable all service checks, but also to put all of them in an 'UNKNOWN' state (this would simulate the parent/child/unreachable logic). Then, even if the event handler trigger AFTER some (failed) service checks, the 'FAILED' status would be replaced by a more accurate 'UNKNOWN' status. Unfortunately, I didn't manage to do that in a script to be run as an event handler. I don't know if it is possible. But my global knowledge about scripting is quite poor. Maybe a Perl or Bash guru could help us writing such a script ?

Kind regards,
--

*Toussaint OTTAVI*
*MEDI INFORMATIQUE*
**


------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to