[EMAIL PROTECTED] schrieb am 17.05.2006 20:09:16: > I am still butting up against very high latency issues with my Nagios > setup. I feel like I must be missing something obvious because it > doesn't seem like I have so many services that the servers cannot keep > up. > > nag2: 193/1743 > > Machine hardware: > 1Us running Fedora Core 4 / P4 2.4GHz / 512MB RAM / 40GB ATA 8MB cache > 7200rpm drives
To me this is obviously a performance issue related to hardware. Your machines have way too few RAM. It is totally not possible to run 1800 checks on a 512MB machine in a timely manner. Think about this: Everytime Nagios starts a check, it forks a child, which forks the check. Nagios usually uses up 26MB total memory per process, the check another 5MB maybe. When running 1800 checks, we are speaking of spreading out 55 GIGAbytes of needed Ram on 512 MB real Ram. Imagine how often that works without having the machines doing a shitload of swapping and io-wait. I really cannot imagine how such a machine can NOT swap when running Nagios. Are you totally sure that you did not make a mistake when checking the machine? Here's a lineup of our dedicated Nagios server, which is a minimal install of RHES4 with only Nagios/Apache running on it (and the HP Insightmanager tools and TSM backup client, but that should not reall matter that much ;)) : top - 11:48:52 up 69 days, 19:10, 1 user, load average: 0.75, 0.70, 0.67 Tasks: 53 total, 2 running, 51 sleeping, 0 stopped, 0 zombie Cpu(s): 9.3% us, 4.3% sy, 0.0% ni, 62.5% id, 23.9% wa, 0.0% hi, 0.0% si Mem: 3116384k total, 2341696k used, 774688k free, 55188k buffers Swap: 6291448k total, 144k used, 6291304k free, 2148772k cached This is a HP DL380, 3,6Ghz Xeon with 3GB of Ram and a Raid5. It is currently running "only" 120 hosts with around 500 checks, but those are in a high frequency schedule - ~400 checks per minute - as those are the company-critical services. Therefor it is under real pressure as you can see from the 2.3GB Mem usage and the 0.75 load with only 500 checks. But I think it is kinda comparable to your triple amount of checks. You should really, really upgrade the ram in the machines. In my opinion that would solve most of your problems, as I imagine you have a lot of io-wait on this machine (which you can check with an uptodate top by the way ;)) regards Sascha -- Sascha Runschke Netzwerk Administration IT-Services ABIT AG Robert-Bosch-Str. 1 40668 Meerbusch Tel.:+49 (0) 2150.9153.226 Mobil:+49 (0) 173.5419665 mailto:[EMAIL PROTECTED] http://www.abit.net http://www.abit-epos.net --------------------------------- Sicherheitshinweis zur E-Mail Kommunikation / Security note regarding email communication: http://www.abit.net/sicherheitshinweis.html ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null