Where I work, the server engineers want Nagios to notify them fairly quickly when a problem develops. During the day, the settings are fine. Recently, however, the nightly backups and scheduled antivirus scans began causing enough load that monitored hosts to become briefly unavailable, but still long enough that Nagios sends notifications that make it to their pagers.
What are some of the strategies you use to deal with this? The last time I dealt with this, I had two service template files, which specified different max_check_attempts and retry_intervals for day and night. I used a cron job to copy the appropriate template file to a name Nagios was configured to load, and restart Nagios. As we upgraded things, the problem went away, so I ditched that setup. It always seemed like a kludge. Scheduled reboots just smell like failure to me and they don't scale well if you have multiple thousands of hosts and services. Well, our server estate has continued to expand and now we're back to committing own-goals with the midnight pages. This time, I'm thinking about defining escalations with different timeperiods, but I'm curious to find out what other approaches have been successful. Thanks! -- -Chris Nothing in this message is intended to make or accept an offer or to form a contract, except that an attachment that is an image of a contract bearing the signature of an officer of our company may be or become a contract. This message (including any attachments) is intended only for the use of the individual or entity to whom it is addressed. It may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, we hereby notify you that any use, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this message in error, please notify us immediately by telephone and delete this message immediately. Thank you. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null