Hi, Apologies for the delay, been very busy with other things.
Right I have put Nagios into Debug this morning and rerun the tests. I let it get a couple of successful pings to the server then pulled the network cable from it. Behaviour is completely different this morning!!!! The host check is behaving now and rechecking every 3 minutes as its told too in the host template. I got my text and email alert to say the host was down when I expected it! But now its the service check that is running every 1 minute now, which its not told too when in problem state. My service template clearly states when in problem state to retry_interval of 3 minutes: define service{ name service-server ; The name of this host template (used above in the checks) check_period server_24x7 ; Server are monitored at all times check_interval 1 ; Server are checked every 1 minute when in OK state retry_interval 3 ; Server checked every 3 minutes if in problem state max_check_attempts 3 ; Server checked 3 times to determine if its Up or Down state notification_period server_24x7 ; Emails and Text are sent out any time of day notification_interval 3 ; Resend Notifications every 3 minutes notification_options c,r ; Only send alerts for servers in CRITICAL or RECOVERY state notifications_enabled 0 ; Notifications are disabled contact_groups servers email, servers sms ; Alerts sent to contacts in these groups event_handler_enabled 1 ; Host event handler is enabled process_perf_data 1 ; Performace data is processed retain_status_information 1 ; Status Info is kept between server restarts retain_nonstatus_information 1 ; Non-Status information is kept between server restarts passive_checks_enabled 0 ; Passive Checks are disabled obsess_over_service 0 ; We do not obsess over the server if in problem state check_freshness 0 ; We do not check this server for freshness flap_detection_enabled 0 ; Flap Detection is disabled failure_prediction_enabled 0 ; We will wait for it to actually fail thankyou!! } And even though its checking every minute, it went straight to Hard State on the first check it detected it down and has stayed on check 1/3 Hard State throughout I really don't understand what is happening here. The only thing different between this setup and my old nagios box is the version - old box was 3.31, this new server is 3.4.1, I am using the same config files that worked fine before. Here is the debug logfiles of the above testing. http://dl.dropbox.com/u/895609/nagios.debug1 http://dl.dropbox.com/u/895609/nagios.debug2 If you see anything please let me know, im getting angry with all the alerts!!! :-) Thankyou -----Original Message----- From: Giorgio Zarrelli [mailto:zarre...@linux.it] Sent: 29 November 2012 19:24 To: Nagios Users List Subject: Re: [Nagios-users] Nagios is ignoring the retry_interval setting Hi, do not seee anything wrong. Could you set debug=-1 repeat the problem and put the log online? Giorgio <quota chi="Andrew Thompson"> > Hi Georgio, > > The whole test cfg I am using to try troubleshoot this can be found at: > > http://dl.dropbox.com/u/895609/test.cfg > > This is a direct copy of my main servers config but with the rest of > the servers and some templates for other server checks taken out > > > > Kind Regards > Andrew > > From: Andrew Thompson > Sent: 29 November 2012 16:11 > To: nagios-users@lists.sourceforge.net > Subject: Nagios is ignoring the retry_interval setting > > Hi, > > My nagios box has decided to stop listening to the retry_interval > entry in my templates. > > My server template reads: > > define host{ > name host-server > check_period server_24x7 > check_interval 1 > retry_interval 3 > max_check_attempts 3 > notification_period server_24x7 > notification_interval 3 > notification_options d,r > notifications_enabled 1 > contact_groups servers email, servers sms > event_handler_enabled 1 > process_perf_data 1 > retain_status_information 1 > retain_nonstatus_information 1 > passive_checks_enabled 0 > obsess_over_host 0 > check_freshness 0 > flap_detection_enabled 0 > failure_prediction_enabled 0 > } > > Now this is what happens: > > > * Server goes down at 1pm. > > * I check the next scheduled check and it clearly states 1.03pm > > * But at 1.01pm it checks again and then spits out an email and > text message saying the server is down. > > Completely ignoring the retry_interval setting!!! > > Id expect from the above: > > > * 1pm server goes down > > * 1.03pm check 2 is done > > * 1.06pm check 3 is done and determined hard state. > > * At 1.06pm the notification should be sent out. > > Why is this, is something in my config wrong? > > Ubuntu 12.04 desktop and Nagios 3.4.1 > > Thanks > > > ---------------------------------------------------------------------- > -------- Keep yourself connected to Go Parallel: > VERIFY Test and improve your parallel project with help from experts > and peers. > http://goparallel.sourceforge.net_____________________________________ > __________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null ---------------------------------------------------------------------------- -- Keep yourself connected to Go Parallel: VERIFY Test and improve your parallel project with help from experts and peers. http://goparallel.sourceforge.net _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null