On Sat, 14 Feb 2009, Thomas Guyot-Sionnest wrote:

> --[PinePGP]--------------------------------------------------[begin]--
> On 12/02/09 12:27 PM, Jeff Frost wrote:
>> I've got a Nagios-3.0.4 server monitoring 3,290 services on 387
>> hosts.    When the nagios service is initially started, service and host
>> latency is great.  This usually continues for about 2-3 hours and then
>> we start seeing fork errors in the log like so:
>>
>> [1234425582] Warning: The check of service 'ssh' on host 'mail02' could
>> not be performed due to a fork() error: 'Cannot allocate memory'.  The
>> check will be rescheduled.
>>
>> At about the same time, we start seeing lots of orphaned
>> /tmp/checkXXXXXX files and indications that the max concurrent checks
>> value has been reached:
>>
>> [1234458853] Max concurrent service checks (500) has been reached.
>> Delaying further checks until previous checks are complete...
>>
>> It should be noted that during this time period, there is 2GB of free
>> memory and 1.2GB of cache available out of the 4GB on the nagios server,
>> so I'm thinking it has to be something besides system RAM that's exhausted.
>>
>> Naturally, when this starts happening, the latencies begin to increase
>> and seem to settle somewhere around 98seconds and interestingly enough,
>> this causes the load to drop to nearly nothing.
>>
>> We have already set the following in nagios.cfg:
>>
>> service_reaper_frequency=2
>> use_large_installation_tweaks=1
>> enable_environment_macros=0
>>
>> If we enable the embedded perl interpreter, the forking issues happen
>> much more quickly after restart (minutes instead of hours).
>
> Which OS/distribution are you running? How much RAM do you have? Free
> RAM? SWAP?
>
> Please send results of "free -m" with and without Nagios running.

Gentoo

With nagios running (and fork errors happening):

free -m
              total       used       free     shared    buffers     cached
Mem:          4096       1421       2674          0         51       1078
-/+ buffers/cache:        292       3803
Swap:          511          0        511

nagios    4808  6.8  0.2  72716 11916 ?        Rsl  02:00   2:47 
/usr/sbin/nagios -d /etc/nagios/nagios.cfg

Without nagios running:

# free -m
              total       used       free     shared    buffers     cached
Mem:          4096       1364       2732          0         51       1073
-/+ buffers/cache:        238       3857
Swap:          511          0        511

Immediately after nagios start:

# free -m
              total       used       free     shared    buffers     cached
Mem:          4096       1386       2709          0         51       1079
-/+ buffers/cache:        255       3840
Swap:          511          0        511


>
> Also send the RSS size of the Nagios process after start, and once you
> get the fork errors.
>
> Nagios 3 is leaking some memory, especially when using the ePN. However
> unless your server is really short on RAM it shouldn't be a huge problem.

I've played with various max connection settings, but it didn't seem to make 
any difference.


-- 
Jeff Frost, Owner       <[email protected]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411     FAX: 916-405-4032

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to