We have Nagios monitoring a variety of services on roughly 50 separate servers. Several of them are mail servers, but only the "main" (that contains most of the Nagios notification recipients) one has this problem.
The mail server will start to become unresponsive so just about any input (but pings fine). Simultaneously, Nagios, which is on a separate server, will send out notifications that every service on every server is down because Nagios cannot reach them. Since almost all of them go through this problem mail server, including those that forward to text messaging services, they will stop and resume again when the mail server is either rebooted, or otherwise is brought back to life...sometimes by restarting the LDAP server process on it. There are perhaps a few dozen total email destinations for notifications. Even multiplying this times the total number of services that Nagios monitors, it doesn't seem likely that it's just volume of emails generated by Nagios would cause all this. It is a fairly modern, multiprocessor server (CentOS/Sendmail). Can anyone offer any insight or similar experiences? Thanks in Advance! ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense.. http://p.sf.net/sfu/splunk-d2d-c1 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null