I've been having an issue with multiple Nagios installations over a span of a few years, and after setting up another installation over the last few months it's biting me again.
Sometimes, but not always, issuing a HUP to Nagios causes bad things to happen. The configuration won't reload (for example, new services don't appear in the web interface), and the Nagios socket stops accepting input. In the case of my current installation, this means that I get a crapload of hung NSCA processes from the external boxes that submit checks for ~4000 services to the "master" box. Withing an hour or so, this'll pretty much take out the master as the number of waiting NSCA processes exhausts the server's file handles. If not caught in time, it takes a server reboot to recover. I'm working around it by training people *not* to send HUPs to nagios, and do full reloads, but I'd love to figure out why this happens and/or if it can be fixed. ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
