I've got a Nagios-3.0.4 server monitoring 3,290 services on 387
hosts.    When the nagios service is initially started, service and host
latency is great.  This usually continues for about 2-3 hours and then
we start seeing fork errors in the log like so:

[1234425582] Warning: The check of service 'ssh' on host 'mail02' could
not be performed due to a fork() error: 'Cannot allocate memory'.  The
check will be rescheduled.

At about the same time, we start seeing lots of orphaned
/tmp/checkXXXXXX files and indications that the max concurrent checks
value has been reached:

[1234458853] Max concurrent service checks (500) has been reached. 
Delaying further checks until previous checks are complete...

It should be noted that during this time period, there is 2GB of free
memory and 1.2GB of cache available out of the 4GB on the nagios server,
so I'm thinking it has to be something besides system RAM that's exhausted.

Naturally, when this starts happening, the latencies begin to increase
and seem to settle somewhere around 98seconds and interestingly enough,
this causes the load to drop to nearly nothing.

We have already set the following in nagios.cfg:

service_reaper_frequency=2
use_large_installation_tweaks=1
enable_environment_macros=0

If we enable the embedded perl interpreter, the forking issues happen
much more quickly after restart (minutes instead of hours).

The nagios user's ulimits look like this:
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 65600
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 32768
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65600
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

We'll likely give 3.0.6 a try soon to see if this magically fixes the
issue even though the changelog doesn't indicate anything obviously
relevant.

-- 
Jeff Frost, Owner       <[email protected]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411     FAX: 916-405-4032


------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to