Hello all, We have a high availability nagios setup, with two hosts in different data centers. Our secondary nagios installation by default runs with notifications disabled, and enables them when it senses the primary has gone down, using the example event handler scripts distributed with Nagios.
This all works great, however... If we lose our primary, and happen to lose other hosts/services at the same time, there is a possibility of notifications not being sent! Suppose the secondary Nagios host decides that a service is in a non-OK state, and does not notify since it's disabled, at this same time, we lose the primary server, which has not already sent a notification, because it has not checked this service yet. Notifications then get enabled on the secondary. Any state changes that occur after that get notified, but this non-OK service that was detected during that short window slips through the cracks... Does this make sense? Anyway, we are more than happy to just re-send all notifications for non-OK conditions on the secondary server as it becomes primary. Is there any way to do this, though? At first I, possibly naively, thought that I could just set the notification count to zero with an external command. This of course had no effect. I have now made a script that sets all non-OK services/hosts to "unknown", and then schedules them for an immediate recheck... That, too, does not seem to be working. Thanks in advance for any insight anyone can offer! -- Robert King - Ingenta, Inc. UNIX Systems Administrator GPG Public key: http://tinyurl.com/9zmws _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
