[Nagios-users] host-down notification can take 50 mins to be sent

stucky Fri, 15 Jun 2007 01:37:45 -0700

Guys

I'm trying the latest stable 2.x version (2.9) and on top of the 2 already
existing default host templates I added a 3rd one since the documentation
states that there is no limit.


I added a host and started monitoring. When I took it down it took between 2
- 5 mins for the host down notification to come in.
However, later on I rebooted again and this time nothing came in. The nagios
log showed nothing about wanting to send a notification either. The box came
back without any
notification.
I took it down again later and waited - after 50 minutes I got a host down
notification. When I brought the host back I almost immediately got a host
up notification.

I removed one of the the templates to change the recursion level of the host
templates from 3 to 2 and tried again. I did 3 tests and all came back fine
this time. I always got the notification
within 5 minutes max.
Then I added the 3rd template back again to see whether it had to do with
that but now I can't reproduce this. I did 2 tests and both were fine.

I don't feel that I can trust nagios now though. I've been using it for a
few years now since version 1.2 and I've never seen this behaviour before.
However, I've also never used more than 1 host/service template. This time I
wanted to make more use of the object inheritance logic to shorten my cfg
but somehow I feel it causes problems.
How deep is the template recursion for most of you folks ?

Here are the templates I was using when the 50 min delay happened

Hosts :

# Host templates

define host{
       name                            generic-host
       notifications_enabled           1
       event_handler_enabled           1
       flap_detection_enabled          1
       failure_prediction_enabled      1
       process_perf_data               1
       retain_status_information       1
       retain_nonstatus_information    1
       notification_period             24x7
       register                        0
       }

define host{
       name                            generic-linux
       use                             generic-host
       check_period                    24x7
       max_check_attempts              10
       check_command                   check-host-alive
       notification_interval           120
       notification_options            d,u,r
       register                        0
       }

define host{
       name                            prod
       use                             generic-linux
       contact_groups                  sysadmins,psst
       register                        0
       }

define host{
       name                            nonprod
       use                             generic-linux
       contact_groups                  sysadmins
       register                        0
       }

Then I use either the prod or nonprod template for all my hosts.

same with services :

# Service templates

define service{
       name                            generic-service
       active_checks_enabled           1
       passive_checks_enabled          1
       parallelize_check               1
       obsess_over_service             1
       check_freshness                 0
       notifications_enabled           1
       event_handler_enabled           1
       flap_detection_enabled          1
       failure_prediction_enabled      1
       process_perf_data               1
       retain_status_information       1
       retain_nonstatus_information    1
       is_volatile                     0
       register                        0
       }

define service{
       name                            generic-checks
       use                             generic-service
       check_period                    24x7
       max_check_attempts              4
       normal_check_interval           5
       retry_check_interval            1
       notification_options            w,u,c,r
       notification_interval           60
       notification_period             24x7
       register                        0
       }


define service{
       name                            prod
       use                             generic-checks
       contact_groups                  sysadmins,psst
       register                        0
       }

define service{
       name                            nonprod
       use                             generic-checks
       contact_groups                  sysadmins
       register                        0
       }

Here I also use prod or nonprod as templates for my services.

I'm gonna test the more tomorrrow but I'm worried that if a host goes down I
might not get notified again until 50 mins later or maybe never who knows ?
It doesn't seem to behave the same way every time but as far as I see it the
service checks are every 5 minutes so within that time frame I should get a
notification.
Parallel checks is turned on as well.

Has anyone seen similar delays ?

--
stucky

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/

_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[Nagios-users] host-down notification can take 50 mins to be sent

Reply via email to