On Sat, 14 Feb 2009, Thomas Guyot-Sionnest wrote:
> --[PinePGP]--------------------------------------------------[begin]--
> On 12/02/09 12:27 PM, Jeff Frost wrote:
>> I've got a Nagios-3.0.4 server monitoring 3,290 services on 387
>> hosts. When the nagios service is initially started, service and host
>> latency is great. This usually continues for about 2-3 hours and then
>> we start seeing fork errors in the log like so:
>>
>> [1234425582] Warning: The check of service 'ssh' on host 'mail02' could
>> not be performed due to a fork() error: 'Cannot allocate memory'. The
>> check will be rescheduled.
>>
>> At about the same time, we start seeing lots of orphaned
>> /tmp/checkXXXXXX files and indications that the max concurrent checks
>> value has been reached:
>>
>> [1234458853] Max concurrent service checks (500) has been reached.
>> Delaying further checks until previous checks are complete...
>>
>> It should be noted that during this time period, there is 2GB of free
>> memory and 1.2GB of cache available out of the 4GB on the nagios server,
>> so I'm thinking it has to be something besides system RAM that's exhausted.
>>
>> Naturally, when this starts happening, the latencies begin to increase
>> and seem to settle somewhere around 98seconds and interestingly enough,
>> this causes the load to drop to nearly nothing.
>>
>> We have already set the following in nagios.cfg:
>>
>> service_reaper_frequency=2
>> use_large_installation_tweaks=1
>> enable_environment_macros=0
>>
>> If we enable the embedded perl interpreter, the forking issues happen
>> much more quickly after restart (minutes instead of hours).
>
> Which OS/distribution are you running? How much RAM do you have? Free
> RAM? SWAP?
>
> Please send results of "free -m" with and without Nagios running.
Gentoo
With nagios running (and fork errors happening):
free -m
total used free shared buffers cached
Mem: 4096 1421 2674 0 51 1078
-/+ buffers/cache: 292 3803
Swap: 511 0 511
nagios 4808 6.8 0.2 72716 11916 ? Rsl 02:00 2:47
/usr/sbin/nagios -d /etc/nagios/nagios.cfg
Without nagios running:
# free -m
total used free shared buffers cached
Mem: 4096 1364 2732 0 51 1073
-/+ buffers/cache: 238 3857
Swap: 511 0 511
Immediately after nagios start:
# free -m
total used free shared buffers cached
Mem: 4096 1386 2709 0 51 1079
-/+ buffers/cache: 255 3840
Swap: 511 0 511
>
> Also send the RSS size of the Nagios process after start, and once you
> get the fork errors.
>
> Nagios 3 is leaking some memory, especially when using the ePN. However
> unless your server is really short on RAM it shouldn't be a huge problem.
I've played with various max connection settings, but it didn't seem to make
any difference.
--
Jeff Frost, Owner <[email protected]>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null