> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:nagios-users-
> [EMAIL PROTECTED] On Behalf Of Trask
> Sent: Wednesday, May 17, 2006 1:09 PM
> To: nagios-users@lists.sourceforge.net
> Subject: [Nagios-users] How to reduce a very high latency number
> 
> I am still butting up against very high latency issues with my Nagios
> setup.  I feel like I must be missing something obvious because it
> doesn't seem like I have so many services that the servers cannot keep
> up.
> 
> As can be seen from the data below, the server with the most service
> checks has the highest latency (usually in the neighborhood of 700
> seconds! -- this is pre-production).  Is my problem really this
> simple?  I have a feeling that is isn't just the number of checks, but
> I cannot figure out why my latency values are so terrible.
> 
> Overview of my setup:
> 
> There are 4 servers.  3 distributed servers (nag1, nag2, nag3) at 3
> distinct geological locations send all their check information via
> NSCA to a 4th, central server (nag4).  The connections between all of
> these servers are very high-bandwidth and are no where near saturated.
>  The only unclear spot to me is the effect that our hardware
> VPN/tunnels might have, however the worst performing server (nag2) is
> on the same LAN as the central server (nag4).
> 
> Nagios v2.2, latest plugins and NRPE/NSCA as of today.  I am running
> embedded perl with perlcache enabled.
> 
> 
> Number of hosts/services:
> nag1: 43/130
> nag2: 193/1743
> nag3: 78 / 780
> nag4: (central server - active host checks, passive srvc checks)
> 
> Performance Info:
> 
> nag1:
> Metric                            Min               Max
> Average
> Check Execution Time:         0.00 sec        20.04 sec       0.024
sec
> Check Latency:                    0.00 sec          1.01 sec
0.011 sec
> Percent State Change:  0.00 %           17.17 %         0.01%
> 
> nag2
> Check Execution Time:         0.00 sec        929.13 sec       1.246
sec
> Check Latency:                    0.00 sec       1180.67 sec
560.462 sec
> Percent State Change:  0.00%          55.59%             0.07%
> 
> nag3:
> Check Execution Time:         0.00 sec        101.70 sec       0.310
sec
> Check Latency:                    0.00 sec        602.57 sec
46.023 sec
> Percent State Change:  0.00%           0.00%              0.00%

My first reaction is to question why some checks are taking >15 minutes
to complete (check execution time) and why you are allowing them to go
that long. I only allow a maximum of 60 seconds for any service check to
execute --

(from nagios.cfg)
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

Some comparable stats from my servers --

PIII 800/512MB 828 Service Checks -

Check Execution Time:   0.13 sec        11.59 sec       7.984 sec
Check Latency:  0.76 sec        15.54 sec       6.583 sec
Percent State Change:   0.00%   6.25%   0.03%

All active checks, load hangs out around 2.

Another box, newer hardware, running nagios + cricket --

2x Dual Core AMD Opteron Processor 275, 2GB RAM, 1260 service checks --

Check Execution Time:   0.04 sec        35.02 sec       6.675 sec
Check Latency:  0.01 sec        38.16 sec       6.692 sec
Percent State Change:   0.00%   9.47%   0.04%

All active checks, load hangs out between 1 and 2.

--
Marc 


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue.
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to