I've got a Nagios-3.0.4 server monitoring 3,290 services on 387 hosts. When the nagios service is initially started, service and host latency is great. This usually continues for about 2-3 hours and then we start seeing fork errors in the log like so:
[1234425582] Warning: The check of service 'ssh' on host 'mail02' could not be performed due to a fork() error: 'Cannot allocate memory'. The check will be rescheduled. At about the same time, we start seeing lots of orphaned /tmp/checkXXXXXX files and indications that the max concurrent checks value has been reached: [1234458853] Max concurrent service checks (500) has been reached. Delaying further checks until previous checks are complete... It should be noted that during this time period, there is 2GB of free memory and 1.2GB of cache available out of the 4GB on the nagios server, so I'm thinking it has to be something besides system RAM that's exhausted. Naturally, when this starts happening, the latencies begin to increase and seem to settle somewhere around 98seconds and interestingly enough, this causes the load to drop to nearly nothing. We have already set the following in nagios.cfg: service_reaper_frequency=2 use_large_installation_tweaks=1 enable_environment_macros=0 If we enable the embedded perl interpreter, the forking issues happen much more quickly after restart (minutes instead of hours). The nagios user's ulimits look like this: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 65600 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 32768 cpu time (seconds, -t) unlimited max user processes (-u) 65600 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited We'll likely give 3.0.6 a try soon to see if this magically fixes the issue even though the changelog doesn't indicate anything obviously relevant. -- Jeff Frost, Owner <[email protected]> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 916-647-6411 FAX: 916-405-4032 ------------------------------------------------------------------------------ _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
