Re: [Nagios-users] Service check on DOWN hosts!!!

Toussaint OTTAVI Thu, 26 Feb 2009 09:08:30 -0800

Marc Powell a écrit:

On Feb 23, 2009, at 9:43 AM, Sergio Ariel wrote:
My problem is that when these host are DOWN, Nagios wait 30 seconds
trying to execute the service check. After this 30 seconds, thentell me"CRITICAL SERVICE". I want to avoid Nagios checks any service inDOWN
Nagios isn't designed to do this; you'll need to jump through hoops toaccomplish it. At the least you need to be running nagios-3 withactive host checks configured. I'd suggest you look at creating anevent handler for your hosts that issues the external command'DISABLE_HOST_SVC_CHECKS' when the host is non-OK and issues theexternal command 'ENABLE_HOST_SVC_CHECKS' when the host recovers.


Hi,

I already posted the same problem some months ago.

I tried Mark's workaround using event handlers and external commands. Ialso tried another smart workaround using service dependancy. Youcreate a 'check_host_alive' explicit service, then you create a servicedependancy, so that all your other services are not checked if the mainservice fails. Using wildcards can be helpful. Search my name 'OTTAVI'in the list history, and you'll find more details about these twoworkarounds.

Anyway, both of these workarounds do not completely solve the problem.If some service checks are scheduled BEFORE the event handler triggers,or before the dependancy operates, then these service checks will return'FAILED' status. Some optimization can be done by reducing checkinterval for the 'parent' check_alive service, but you will still getsome 'FAILED' status for some checks that are scheduled before...


I'm having this trouble for months, but I didn't find any suitable solution.

Nagios has a 'parent/child' relationship system, which could be helpfulin such a situation. But it works only for hosts. There are noparent/child relationship between services, or between hosts andservices, which could solve our problem completely ! Let's hope thedeveloppers will take our problem into consideration for future versions.

Another idea of a better workaround would be using an event handler, notto only to disable all service checks, but also to put all of them in an'UNKNOWN' state (this would simulate the parent/child/unreachablelogic). Then, even if the event handler trigger AFTER some (failed)service checks, the 'FAILED' status would be replaced by a more accurate'UNKNOWN' status. Unfortunately, I didn't manage to do that in a scriptto be run as an event handler. I don't know if it is possible. But myglobal knowledge about scripting is quite poor. Maybe a Perl or Bashguru could help us writing such a script ?


Kind regards,
--

*Toussaint OTTAVI*
*MEDI INFORMATIQUE*
**

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H

_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Re: [Nagios-users] Service check on DOWN hosts!!!

Reply via email to