Did you have any task/cron running every 30 minutes? What is the io wait of that vm?
On 20/04/2012, at 15:31, Marki <jm+nagios-us...@roth.lu> wrote: > Hi, > > we have a problem where all the services checked around 00:01, 03:01, 06:01, > ..., i.e. every three hours one minute after the hour, return a critical soft > state. Most of the times they go back to normal, however sometimes they also > end > up in a hard state. You can imagine the rest... > > We are running Nagios in a virtualized environment (vmware), on a SLES10 VM > with > 3GB of RAM and 4 vCPUs. The average load of the machine is about 5. > > We did not succeed in reproducing network trouble when doing basic checks > around > those times from and to other hosts. Indeed the VM running nagios experiences > packet loss somehow. Even when run on completely different Vmware hosts: > > Tue Apr 17 21:02:01 CEST 2012 > 5000 packets transmitted, 4990 received, 0% packet loss, time 3840ms > – > 5000 packets transmitted, 4998 received, 0% packet loss, time 2979ms > 5000 packets transmitted, 4994 received, 0% packet loss, time 6190ms > – > Wed Apr 18 09:02:01 CEST 2012 > 5000 packets transmitted, 4999 received, 0% packet loss, time 5230ms > – > 5000 packets transmitted, 4999 received, 0% packet loss, time 3340ms > – > 5000 packets transmitted, 4979 received, 0% packet loss, time 11298ms > – > Wed Apr 18 12:02:01 CEST 2012 > 5000 packets transmitted, 4978 received, 0% packet loss, time 12764ms > – > Wed Apr 18 15:01:01 CEST 2012 > 5000 packets transmitted, 4987 received, 0% packet loss, time 4037ms > – > Wed Apr 18 15:02:01 CEST 2012 > 5000 packets transmitted, 4987 received, 0% packet loss, time 9010ms > > Do you think this is related to Nagios? What could that be? > > Here are some Nagios metrics: > > Services Actively Checked: > <= 1 minute: 0 (0.0%) > <= 5 minutes: 2096 (78.3%) > <= 15 minutes: 2626 (98.1%) > <= 1 hour: 2665 (99.5%) > Since program start: 2666 (99.6%) > > Metric Min. Max. Average > Check Execution Time: 0.00 sec 52.15 sec 1.133 sec > Check Latency: 0.00 sec 3.03 sec 0.183 sec > Percent State Change: 0.00% 64.54% 1.16% > > Check Stats: > Type Last 1 Min Last 5 Min Last 15 Min > Active Scheduled Host Checks 54 282 602 > Active On-Demand Host Checks 25 123 405 > Parallel Host Checks 56 290 614 > Serial Host Checks 0 0 0 > Cached Host Checks 23 115 387 > Passive Host Checks 0 0 0 > Active Scheduled Service Checks 987 4203 12647 > Active On-Demand Service Checks 0 0 0 > Cached Service Checks 0 0 0 > Passive Service Checks 0 0 0 > External Commands 0 0 0 > > > > Thanks > > marki > > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null