We have Nagios monitoring a variety of services on roughly 50 separate servers. 
 Several of them
are mail servers, but only the "main" (that contains most of the Nagios 
notification recipients)
one has this problem.

The mail server will start to become unresponsive so just about any input (but 
pings fine). 
Simultaneously, Nagios, which is on a separate server, will send out 
notifications that every
service on every server is down because Nagios cannot reach them.  Since almost 
all of them go
through this problem mail server, including those that forward to text 
messaging services, they
will stop and resume again when the mail server is either rebooted, or 
otherwise is brought back
to life...sometimes by restarting the LDAP server process on it.

There are perhaps a few dozen total email destinations for notifications.  Even 
multiplying this
times the total number of services that Nagios monitors, it doesn't seem likely 
that it's just
volume of emails generated by Nagios would cause all this.  It is a fairly 
modern, multiprocessor
server (CentOS/Sendmail).

Can anyone offer any insight or similar experiences?

Thanks in Advance!

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a 
definitive record of customers, application performance, security 
threats, fraudulent activity and more. Splunk takes this data and makes 
sense of it. Business sense. IT sense. Common sense.. 
http://p.sf.net/sfu/splunk-d2d-c1
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to