So does anyone have any ideas as to how I can resolve this situation? It continues to be an annoyance. Thanks.
----------------------------------------------- Israel Brewster Computer Support Technician II Frontier Flying Service Inc. 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x293 ----------------------------------------------- On Mar 31, 2009, at 8:17 AM, Israel Brewster wrote: > On Mar 31, 2009, at 1:09 AM, Andreas Ericsson wrote: > >> Israel Brewster wrote: >>> Does nagios (3.0.3) mark a child host as unreachable when its >>> parent enters a soft down state? I am finding myself getting >>> repeated down messages for a host (which is, in fact, down), even >>> though I have notifications set to only send a single message. >>> Looking at the logs, it would appear that what is happening is >>> that the host is flipping between "down" (which notifies me) and >>> "unreachable" (which does not). The parent host, however, never >>> enters a hard down state. Looking at the logs, what I see is that >>> one ICMP check fails, throwing the host into a soft down state, >>> but the next one works just fine, bringing it back to an up state. >>> The logic works fine for the parent host- since it never hits a >>> hard down state, it doesn't alert, and everyone is happy. But >>> apparently with the child host every time this happens, it >>> switches from critical to unreachable and back again, triggering a >>> notification. Is there any way to keep this from happening? Thanks. >> >> Doesn't flapping detection do what you want? You'd get a few >> notifications, but they'd stop after the 3rd flip or something, I >> think. > > Flapping detection helps, but doesn't solve. For one thing, as you > mentioned, you still get at least a couple of notifications before it > kicks in. For another thing, this happens with a frequency of > something like once an hour or so (not consistently), so the host will > flip from down to unreachable and back again, triggering an e-mail, > perhaps do it a second time, and then it will sit in the correct > "down" state for the next 50 checks or so (thus canceling any flapping > detection) before repeating the process. It's not like I'm getting > messages every five minutes or anything, it's just that I'm getting > repeated down messages every hour or two for hosts that have been down > and haven't actually changed state. > > I could, of course, schedule down time, except that I want to be > notified if/when the people in the remote station get their act > together and get the machine(s) in question back online. Also that is > only partially effective for machines that have been sent in for > repair, because I don't really know when the scheduled down time will > be over. They are down, I know they are down, I just don't want to be > told about it every few hours :-) > > ----------------------------------------------- > Israel Brewster > Computer Support Technician II > Frontier Flying Service Inc. > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7250 x293 > ----------------------------------------------- > >> >> >> -- >> Andreas Ericsson andreas.erics...@op5.se >> OP5 AB www.op5.se >> Tel: +46 8-230225 Fax: +46 8-230231 >> >> Considering the successes of the wars on alcohol, poverty, drugs and >> terror, I think we should give some serious thought to declaring war >> on peace. > > > ------------------------------------------------------------------------------ > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null