-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/02/09 12:27 PM, Jeff Frost wrote: > I've got a Nagios-3.0.4 server monitoring 3,290 services on 387 > hosts. When the nagios service is initially started, service and host > latency is great. This usually continues for about 2-3 hours and then > we start seeing fork errors in the log like so: > > [1234425582] Warning: The check of service 'ssh' on host 'mail02' could > not be performed due to a fork() error: 'Cannot allocate memory'. The > check will be rescheduled. > > At about the same time, we start seeing lots of orphaned > /tmp/checkXXXXXX files and indications that the max concurrent checks > value has been reached: > > [1234458853] Max concurrent service checks (500) has been reached. > Delaying further checks until previous checks are complete... > > It should be noted that during this time period, there is 2GB of free > memory and 1.2GB of cache available out of the 4GB on the nagios server, > so I'm thinking it has to be something besides system RAM that's exhausted. > > Naturally, when this starts happening, the latencies begin to increase > and seem to settle somewhere around 98seconds and interestingly enough, > this causes the load to drop to nearly nothing. > > We have already set the following in nagios.cfg: > > service_reaper_frequency=2 > use_large_installation_tweaks=1 > enable_environment_macros=0 > > If we enable the embedded perl interpreter, the forking issues happen > much more quickly after restart (minutes instead of hours).
Which OS/distribution are you running? How much RAM do you have? Free RAM? SWAP? Please send results of "free -m" with and without Nagios running. Also send the RSS size of the Nagios process after start, and once you get the fork errors. Nagios 3 is leaking some memory, especially when using the ePN. However unless your server is really short on RAM it shouldn't be a huge problem. If you're stuck with low-end hardware make sure to run the server without its graphical interface and disable as many daemons as possible. A slim Linux distribution like Slackware (if you use Linux) could also help. Another setting that could help is limiting check parallelization, though it was reported that there may be a problem with it on Nagios3 (it hasn't been confirmed AFAIK). - -- Thomas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJluyT6dZ+Kt5BchYRAo67AKCGGhi+EzKbxNvkMuzOkYOqsQDG3ACgqIG9 9jlBUwg6O2pM6vWA7qQdNTs= =l5Hz -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
