I am still butting up against very high latency issues with my Nagios setup. I feel like I must be missing something obvious because it doesn't seem like I have so many services that the servers cannot keep up.
As can be seen from the data below, the server with the most service checks has the highest latency (usually in the neighborhood of 700 seconds! -- this is pre-production). Is my problem really this simple? I have a feeling that is isn't just the number of checks, but I cannot figure out why my latency values are so terrible. Overview of my setup: There are 4 servers. 3 distributed servers (nag1, nag2, nag3) at 3 distinct geological locations send all their check information via NSCA to a 4th, central server (nag4). The connections between all of these servers are very high-bandwidth and are no where near saturated. The only unclear spot to me is the effect that our hardware VPN/tunnels might have, however the worst performing server (nag2) is on the same LAN as the central server (nag4). Nagios v2.2, latest plugins and NRPE/NSCA as of today. I am running embedded perl with perlcache enabled. Number of hosts/services: nag1: 43/130 nag2: 193/1743 nag3: 78 / 780 nag4: (central server - active host checks, passive srvc checks) Performance Info: nag1: Metric Min Max Average Check Execution Time: 0.00 sec 20.04 sec 0.024 sec Check Latency: 0.00 sec 1.01 sec 0.011 sec Percent State Change: 0.00 % 17.17 % 0.01% nag2 Check Execution Time: 0.00 sec 929.13 sec 1.246 sec Check Latency: 0.00 sec 1180.67 sec 560.462 sec Percent State Change: 0.00% 55.59% 0.07% nag3: Check Execution Time: 0.00 sec 101.70 sec 0.310 sec Check Latency: 0.00 sec 602.57 sec 46.023 sec Percent State Change: 0.00% 0.00% 0.00% Machine load numbers: nag1: load average: 0.05, 0.08, 0.02 / mem: 470 / 512MB physical ; not swapping nag2: load average: 0.50, 0.61, 0.59 / mem: 330 / 512MB physical ; not swapping nag3: load average: 0.25, 0.52, 0.56 / mem: 330 / 512MB physical ; not swapping Machine hardware: 1Us running Fedora Core 4 / P4 2.4GHz / 512MB RAM / 40GB ATA 8MB cache 7200rpm drives Ok, that is all I can think of off the top of my head. I have reviewed the performance tuning tuning doc (from here: http://nagios.sourceforge.net/docs/2_0/tuning.html), but I am open to trying things again / in a different way. I can list off what I've done in response to that doc on a point-by-point basis if anyone is interested. Thanks for any help -- this latency issue is the last big hurdle before getting this thing going. ~trask ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nagios-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null
