Guys

I'm testing nagios 3.0a and I'm thinking there is a notification bug.

I have the following config:

define timeperiod{
       timeperiod_name 24x7
       alias           24 Hours A Day, 7 Days A Week
       sunday          00:00-24:00
       monday          00:00-24:00
       tuesday         00:00-24:00
       wednesday       00:00-24:00
       thursday        00:00-24:00
       friday          00:00-24:00
       saturday        00:00-24:00
       }

define contact{
       name                            generic-contact         ; The name
of this contact template
       service_notification_period     24x7                    ; service
notifications can be sent anytime
       host_notification_period        24x7                    ; host
notifications can be sent anytime
       service_notification_options    w,u,c,r,f,s             ; send
notifications for all service states, flapping events, and scheduled
downtime events
       host_notification_options       d,u,r,f,s               ; send
notifications for all host states, flapping events, and scheduled downtime
events
       service_notification_commands   notify-service-by-email ; send
service notifications via email
       host_notification_commands      notify-host-by-email    ; send host
notifications via email
       register                        0                       ; DONT
REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
       }

define contact{
       contact_name                    astuck
       use                             generic-contact
       alias                           SysAdmin1
       email                           {my email}
       }

define contactgroup{
       contactgroup_name       admins
       alias                   SysAdmins
       members                 astuck
       }

define host{
       name                            generic-host    ; The name of this
host template
       notifications_enabled           1               ; Host notifications
are enabled
       event_handler_enabled           1               ; Host event handler
is enabled
       flap_detection_enabled          1               ; Flap detection is
enabled
       failure_prediction_enabled      1               ; Failure prediction
is enabled
       process_perf_data               1               ; Process
performance data
       retain_status_information       1               ; Retain status
information across program restarts
       retain_nonstatus_information    1               ; Retain non-status
information across program restarts
       notification_period             24x7            ; Send host
notifications at any time
       register                        0               ; DONT REGISTER THIS
DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
       }

define host{
       name                            generic-linux
       use                             generic-host
       check_period                    24x7
       check_interval                  5
       retry_interval                  1
       max_check_attempts              10
       check_command                   check-host-alive
       notification_interval           120
       notification_options            d,u,r
       register                        0
       }

define host{
       name                            nonprod
       use                             generic-linux
       contact_groups                  admins
       register                        0
       }

define host{
       use                     nonprod
       host_name               lithium
       alias                   Oracle Dev 2
       address                 lithium
       }

As far as I see it I should get all host/service notification 24/7. However,
when I reboot 'lithium' I get a host down notification but when it comes
back
I don't get anything.
I turned on notification debugging :

[1181695731.149796:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969
[1181695731.149852:032.0] Notification viability test passed.
[1181695731.149861:032.1] Current notification number: 1
[1181695731.149867:032.2] Creating list of contacts to be notified.
[1181695731.149873:032.1] Host notification will NOT be escalated.
[1181695731.149879:032.2] Adding contact 'astuck' to notification list.
[1181695731.149985:032.2] ** Attempting to notifying contact 'astuck'...
[1181695731.149994:032.2] ** Checking host notification viability for
contact 'astuck'...
[1181695731.150005:032.2] ** Host notification viability for contact
'astuck' PASSED.
[1181695731.150014:032.2] ** Notifying contact 'astuck'
[1181695731.150071:032.2] Raw Command: /usr/bin/printf "%b" "***** Nagios
*****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState:
$HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time:
$LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert:
$HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
[1181695731.150078:032.2] Processed Command: /usr/bin/printf "%b" "*****
Nagios *****\n\nNotification Type: PROBLEM\nHost: lithium\nState:
DOWN\nAddress: lithium\nInfo: (No output returned from host
check)\n\nDate/Time: Tue Jun 12 17:48:51 PDT 2007\n" | /bin/mail -s "**
PROBLEM Host Alert: lithium is DOWN **" {my email}
[1181695731.194505:032.0] No contacts were notified.  Next possible
notification time: Tue Jun 12 19:48:51 2007
[1181695731.194527:032.0] 1 contacts were notified.[1181695741.047809:032.0]
** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1,
Last Notification: Tue Jun 12 17:48:51 2007
[1181695741.047834:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695741.047843:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695741.047850:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695751.160027:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695751.160058:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695751.160068:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695751.160074:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695811.210449:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695811.210479:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695811.210489:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695811.210495:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695821.068538:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695821.068569:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695821.068580:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695821.068586:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695821.068895:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695821.068915:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695821.068924:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695821.068931:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695831.174383:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695831.174418:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695831.174427:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695831.174434:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695831.174731:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695831.174745:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695831.174754:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695831.174760:032.0] Notification viability test failed.  No
notification will be sent out.
[1181695851.144314:032.0] ** Host Notification Attempt ** Host: 'lithium',
Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
[1181695851.144338:032.1] Its not yet time to re-notify the contacts about
this host problem...
[1181695851.144347:032.1] Next acceptable notification time: Tue Jun 12
19:48:51 2007
[1181695851.144354:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696025.034559:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'DISK USAGE /tmp', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696025.034582:032.1] We shouldn't notify about this recovery.
[1181696025.034589:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696031.130428:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'LOAD', Type: 0, Current State: 0, Last Notification:
Wed Dec 31 16:00:00 1969
[1181696031.130452:032.1] We shouldn't notify about this recovery.
[1181696031.130460:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696031.131081:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'DISK USAGE /usr/local', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696031.131095:032.1] We shouldn't notify about this recovery.
[1181696031.131102:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696111.052735:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'CFENVD', Type: 0, Current State: 0, Last Notification:
Wed Dec 31 16:00:00 1969
[1181696111.052759:032.1] We shouldn't notify about this recovery.
[1181696111.052766:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696111.052971:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'PERC CONTROLLER', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696111.052984:032.1] We shouldn't notify about this recovery.
[1181696111.052992:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696111.053334:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'CFEXECD', Type: 0, Current State: 0, Last Notification:
Wed Dec 31 16:00:00 1969
[1181696111.053348:032.1] We shouldn't notify about this recovery.
[1181696111.053355:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696121.163710:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'MEM', Type: 0, Current State: 0, Last Notification: Wed
Dec 31 16:00:00 1969
[1181696121.163738:032.1] We shouldn't notify about this recovery.
[1181696121.163746:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696121.163984:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'DISK USAGE /var', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696121.163998:032.1] We shouldn't notify about this recovery.
[1181696121.164005:032.0] Notification viability test failed.  No
notification will be sent out.
[1181696141.130999:032.0] ** Service Notification Attempt ** Host:
'lithium', Service: 'DISK USAGE /', Type: 0, Current State: 0, Last
Notification: Wed Dec 31 16:00:00 1969
[1181696141.131023:032.1] We shouldn't notify about this recovery.
[1181696141.131031:032.0] Notification viability test failed.  No
notification will be sent out.

Clearly, nagios decided that I shouldn't get a host up notification. I just
don't understand why. From the log files I'd say the following logic takes
place :

1. Host goes down - service check fails
2. Nagios checks to see if host is down - YES
3. Because of step 2. no service notifications are sent
4. Host down notification is sent instead
5. Host comes back
6. Service checks start recovering - no service recovery notification is
sent since no service problem notifications were sent in the first place.
7. Host is assumed to be up since service is up
8. Hence - no host up notification.

First I thought my host up notification might not make it through one of the
notification filters but according to the log there is NO HOST check after
the reboot therefore
there is no host notification attempt.
Looks to me like a design bug but I wanna make sure I'm not getting this
wrong. It just doesn't make sense to me that I wouldn't be notified
about a host coming back. I understand the part about the services.

INTERESTING: I have rebooted a few times and it appears that sometimes I do
get host up notifications but most of the time I don't so it seems to have
to do with
when exactly the reboot occurs.
Also, I turned off flapping globally but no difference.

Anyone seen this behaviour ?
--
stucky
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to