On Jan 8, 2010, at 2:55 PM, gmartin wrote:

Israel,
I believe you are correct.  I'll be interested to hear what other shave to say on the inner workings.  In the meantime, can the problem be solved if the event handler for Service B is written to restart svc A  if it is down? (perhaps it calls the same nagios check from the command line and acts on the results)

Yeah, that should work, at least for my specific situation. Of course, doing so greatly reduces the utility of having the dependancy in the first place, since the situations under which it would be triggered (given nagios restarting service A as soon as it detects it as down) would be somewhat rare, and even when triggered it would no longer be needed, since the service B event handler does its own dependancy checking. 

The only time the dependancy would apply (assuming our understanding is right) is in the situation where Nagios detects A as down, and then tries to run a check on B before verifying that A is back up. Of course, even then it wouldn't matter, since a) nagios should have restarted service A immediately (so a straight restart of B would be fine), and b) even if nagios didn't, the new event handler for service B would. At which point there is no need of the dependancy at all, since the event handler takes care of the dependancies. Basically, if the dependancy only applies when nagios ALREADY knows service A is down, then the dependancy is basically useless, at least in this situation. Of course, if this is just the way dependancies work, then there may be no other option. Thanks for the feedback.


\\Greg



On Fri, Jan 8, 2010 at 6:07 PM, Israel Brewster <isr...@frontierflying.com> wrote:
Here's the situation: running nagios 3.2.0, I have two services, we'll call them A and B. Both have event handlers such that if they register a hard critical state, Nagios attempts to restart them. Service B depends on service A, such that when service A goes down, service B does as well, causing them both to need restarted, with A needing to be restarted first. I have a servicedependancy set up in nagios specifying service B's dependancy on service A.

My understanding is that the way this works is that when nagios goes to check service B, it first looks at the "current" state (as defined by the last nagios check) of service A, and, if the execution_failure_criteria matches (i.e. if service A is down) nagios does not run the check on service B, thus not running the event handler to attempt to restart B until A is back up. This is good. But what happens in the following scenario?

Service A is scheduled to check every 5 minutes.
1) Nagios does a normally scheduled check of service A, finding it to be OK.
2) One minute later, Service A crashes
3) One minute after that (three minutes prior to the next regular check of service A), thanks to nagios staggering checks, Nagios goes to do a normal check of service B

Now, to my understanding of this scenario, the check on service B would run normally, since the last check on A was OK, and nagios uses cached results for dependancy checks. Since service A is actually critical, service B will be critical as well. The problem with this is that Nagios will respond by attempting to restart service B, which will invariably fail since service A is still down. Once the next regular check time for service A is reached, Nagios will detect service A as down and restart it, but service B will never get restarted successfully, since nagios already tried and failed. 

Is this correct? If so, what can be done about it? Or is nagios smart enough to schedule its service checks to avoid this scenario? It seems that the most logical solution (if possible) would be to mirror the service/host check logic. That is, when a check of service B comes back as critical, immediately check service A. If service A is critical, then don't declare service B to be critical until service A is OK, at which point B would enter a hard down state and run the event handler. Alternately, if I could say something like always check service A immediately before checking service B to make sure our data is current, that would work as well. Although I could see it resulting in excessive checking of service A, which may be less desirable. What do you guys think? 
-----------------------------------------------
Israel Brewster
Computer Support Technician II
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------




------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null


-----------------------------------------------
Israel Brewster
Computer Support Technician II
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------

BEGIN:VCARD
VERSION:3.0
N:Brewster;Israel;;;
FN:Israel Brewster
ORG:Frontier Flying Service;MIS
TITLE:PC Support Tech II
EMAIL;type=INTERNET;type=WORK;type=pref:isr...@frontierflying.com
TEL;type=WORK;type=pref:907-450-7293
item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701;
item1.X-ABADR:us
CATEGORIES:General
X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson
END:VCARD

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to