On Mar 31, 2009, at 1:09 AM, Andreas Ericsson wrote: > Israel Brewster wrote: >> Does nagios (3.0.3) mark a child host as unreachable when its >> parent enters a soft down state? I am finding myself getting >> repeated down messages for a host (which is, in fact, down), even >> though I have notifications set to only send a single message. >> Looking at the logs, it would appear that what is happening is >> that the host is flipping between "down" (which notifies me) and >> "unreachable" (which does not). The parent host, however, never >> enters a hard down state. Looking at the logs, what I see is that >> one ICMP check fails, throwing the host into a soft down state, >> but the next one works just fine, bringing it back to an up state. >> The logic works fine for the parent host- since it never hits a >> hard down state, it doesn't alert, and everyone is happy. But >> apparently with the child host every time this happens, it >> switches from critical to unreachable and back again, triggering a >> notification. Is there any way to keep this from happening? Thanks. > > Doesn't flapping detection do what you want? You'd get a few > notifications, but they'd stop after the 3rd flip or something, I > think.
Flapping detection helps, but doesn't solve. For one thing, as you mentioned, you still get at least a couple of notifications before it kicks in. For another thing, this happens with a frequency of something like once an hour or so (not consistently), so the host will flip from down to unreachable and back again, triggering an e-mail, perhaps do it a second time, and then it will sit in the correct "down" state for the next 50 checks or so (thus canceling any flapping detection) before repeating the process. It's not like I'm getting messages every five minutes or anything, it's just that I'm getting repeated down messages every hour or two for hosts that have been down and haven't actually changed state. I could, of course, schedule down time, except that I want to be notified if/when the people in the remote station get their act together and get the machine(s) in question back online. Also that is only partially effective for machines that have been sent in for repair, because I don't really know when the scheduled down time will be over. They are down, I know they are down, I just don't want to be told about it every few hours :-) ----------------------------------------------- Israel Brewster Computer Support Technician II Frontier Flying Service Inc. 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x293 ----------------------------------------------- > > > -- > Andreas Ericsson andreas.erics...@op5.se > OP5 AB www.op5.se > Tel: +46 8-230225 Fax: +46 8-230231 > > Considering the successes of the wars on alcohol, poverty, drugs and > terror, I think we should give some serious thought to declaring war > on peace. ------------------------------------------------------------------------------ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null