Hi Andy, How about this, to provide a clue, but maybe add confusion, too I have four UNIX boxes that I'm monitoring. two are ongoing and having this problem that I've asked your help with; they send email once an hour. One box sends me email a few times an hour. The fourth box is just fine and never sends me anything - it's always up and has no reason to notify me.
----- Original Message ---- From: Andy Shellam <[EMAIL PROTECTED]> To: Grant Lowe <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED]; nagios-user Mailinglist <nagios-users@lists.sourceforge.net> Sent: Thursday, October 23, 2008 1:48:33 PM Subject: Re: [Nagiosplug-help] Host monitoring Hi Grant, That is weird - according to that log file, Nagios hasn't notified you at all today (it should say HOST/SERVICE NOTIFICATION for every notification it sends out.) However, your services are alerting on every OK result - if you convert the timestamps for your ping service you'll notice it's every 5 minutes - which I'm guessing is your service check interval. I have absolutely no idea why Nagios thinks that an OK state is an alert though. Does anyone with more experience than me have any ideas? (Copied in to nagios-users as is it seems more an issue with Nagios than the plugins.) It could be something dead simple but I'm not seeing it! Thanks, Andy Grant Lowe wrote: > Hi Andy, > > This is peculiar. I look at the GUI and it says, from the first day I > installed Nagios: > > Alert Notifications > File: > /usr/local/nagios/var/archives/nagios-09-23-2008-00.log > Notification detail level for all > hosts: > All notificationsAll service notificationsAll host notificationsService > customService acknowledgementsService warningService unknownService > criticalService > recoveryService flappingHost > customHost acknowledgementsHost downHost unreachableHost recoveryHost > flapping > Older Entries First: > Host > Service > Type > Time > Contact > Notification Command > Information > No notifications have been recorded in this archived log > file > > But if I look at the file in question, I see this: > > nagios-09-23-2008-00.log:[1222095978] SERVICE ALERT: > blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.38 ms > nagios-09-23-2008-00.log:[1222096228] SERVICE ALERT: > blarney;ssh;OK;HARD;1;SSH OK - (protocol 1.5) > nagios-09-23-2008-00.log:[1222096278] SERVICE ALERT: > blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.28 ms > nagios-09-23-2008-00.log:[1222096578] SERVICE ALERT: > blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.61 ms > nagios-09-23-2008-00.log:[1222096878] SERVICE ALERT: > blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.29 ms > nagios-09-23-2008-00.log:[1222097178] SERVICE ALERT: > blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.27 ms > nagios-09-23-2008-00.log:[1222097478] SERVICE ALERT: > blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.31 ms > nagios-09-23-2008-00.log:[1222097778] SERVICE ALERT: > blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.35 ms > nagios-09-23-2008-00.log:[1222098078] SERVICE ALERT: > blarney;ping;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 3.22 ms > > This is weird. Does this help you to help me? > > > ----- Original Message ---- > > From: Andy Shellam <[EMAIL PROTECTED]> > To: Grant Lowe <[EMAIL PROTECTED]> > Cc: [EMAIL PROTECTED] > Sent: Thursday, October 23, 2008 11:31:21 AM > Subject: Re: [Nagiosplug-help] Host monitoring > > Hi Grant, > > Use the Nagios GUI - it's the "Alert History" option in the Reporting > menu - navigate back to when you first received the e-mails for that > host and see what the status change was like. e.g. here's a sample from > mine when my co-lo host's router had a reboot overnight: > > [22-10-2008 01:29:41] HOST ALERT: Telehouse Router 2;UP;HARD;1;PING OK - > Packet loss = 0%, RTA = 0.34 ms > [22-10-2008 01:29:31] HOST ALERT: Sydney;UP;HARD;1;PING OK - Packet loss > = 0%, RTA = 21.60 ms > [22-10-2008 01:26:41] HOST ALERT: Sydney;UNREACHABLE;HARD;3;PING > CRITICAL - Packet loss = 100% > [22-10-2008 01:26:31] HOST ALERT: Telehouse Router > 2;DOWN;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds. > [22-10-2008 01:25:51] HOST ALERT: Sydney;UNREACHABLE;SOFT;2;PING > CRITICAL - Packet loss = 100% > > You could also look at the Event Log option for the same time period > which will also list the notifications Nagios sent out: > > [22-10-2008 01:29:41] HOST NOTIFICATION: Andy Shellam;Telehouse Router > 2;UP;notify-host-problem;PING OK - Packet loss = 0%, RTA = 0.34 ms > [22-10-2008 01:29:41] HOST ALERT: Telehouse Router 2;UP;HARD;1;PING OK - > Packet loss = 0%, RTA = 0.34 ms > [22-10-2008 01:29:31] HOST NOTIFICATION: Andy > Shellam;Sydney;UP;notify-host-problem;PING OK - Packet loss = 0%, RTA = > 21.60 ms > [22-10-2008 01:29:31] HOST ALERT: Sydney;UP;HARD;1;PING OK - Packet loss > = 0%, RTA = 21.60 ms > [22-10-2008 01:26:41] HOST NOTIFICATION: Andy > Shellam;Sydney;UNREACHABLE;notify-host-problem;PING CRITICAL - Packet > loss = 100% > [22-10-2008 01:26:41] HOST ALERT: Sydney;UNREACHABLE;HARD;3;PING > CRITICAL - Packet loss = 100% > [22-10-2008 01:26:31] HOST NOTIFICATION: Andy Shellam;Telehouse Router > 2;DOWN;notify-host-problem;CHECK_NRPE: Socket timeout after 10 seconds. > [22-10-2008 01:26:31] HOST ALERT: Telehouse Router > 2;DOWN;HARD;3;CHECK_NRPE: Socket timeout after 10 seconds. > [22-10-2008 01:25:51] HOST ALERT: Sydney;UNREACHABLE;SOFT;2;PING > CRITICAL - Packet loss = 100% > > Andy > > Grant Lowe wrote: > >> Hey Andy, >> >> Which file in /usr/local/nagios/var should I be looking at? Is it the log >> file from the archives directory? If so, then what sort of string should I >> be looking for? >> >> grant >> >> >> ----- Original Message ---- >> From: Andy Shellam <[EMAIL PROTECTED]> >> To: Grant Lowe <[EMAIL PROTECTED]> >> Cc: [EMAIL PROTECTED] >> Sent: Tuesday, October 21, 2008 2:42:43 PM >> Subject: Re: [Nagiosplug-help] Host monitoring >> >> Hi Grant, >> >> That's what I was afraid of! Your mail commands are using the >> $NOTIFICATIONTYPE$ macro which is where your PROBLEM text comes from - >> in that command definition you can customise the template of the mail >> that goes out. >> >> Unfortunately I have no idea why Nagios is classing a host up state as a >> problem, instead of a recovery. Can you review the history of that >> host/service shortly before the alert got to you using the Nagios "Alert >> History" GUI? >> >> What version of Nagios is this on? >> >> Thanks, >> >> Andy >> >> Grant Lowe wrote: >> >> >>> Ok, Andy. Here they are. >>> >>> # 'notify-by-email' command definition >>> define command{ >>> command_name notify-by-email >>> command_line /usr/bin/printf "%b" "***** Nagios @VERSION@ >>> *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: >>> $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: >>> $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional >>> Info:\n\n$SERVICEOUTPUT$" | @MAIL_PROG@ -s "** $NOTIFICATIONTYPE$ alert - >>> $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ >>> } >>> >>> # 'notify-host-by-email' command definition >>> define command{ >>> command_name notify-host-by-email >>> command_line /usr/bin/printf "%b" "***** Nagios >>> *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: >>> $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: >>> $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: >>> $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$ >>> } >>> >>> # 'notify-service-by-email' command definition >>> define command{ >>> command_name notify-service-by-email >>> command_line /usr/bin/printf "%b" "***** Nagios >>> *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: >>> $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: >>> $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional >>> Info:\n\n$SERVICEOUTPUT$" | /bin/mail -s "** $NOTIFICATIONTYPE$ Service >>> Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$ >>> } >>> >>> Thanks, Andy! >>> >>> >>> >>> ----- Original Message ---- >>> From: Andy Shellam <[EMAIL PROTECTED]> >>> To: Grant Lowe <[EMAIL PROTECTED]> >>> Cc: [EMAIL PROTECTED] >>> Sent: Tuesday, October 21, 2008 1:32:09 PM >>> Subject: Re: [Nagiosplug-help] Host monitoring >>> >>> Hi Grant, >>> >>> Your contact has the commands "notify-service-by-email" and >>> "notify-host-by-email" set for the notifications. These should be >>> present in your commands.cfg file, so we need to see the command_line >>> definitions for each of these commands - this is the server command-line >>> that is executed to send you the notifications. >>> >>> Regards, >>> >>> Andy >>> >>> Grant Lowe wrote: >>> >>> >>> >>>> Hi Andy, >>>> >>>> Here's the generic-contact from the template: >>>> >>>> define contact{ >>>> name generic-contact ; The name >>>> of this contact template >>>> service_notification_period 24x7 ; service >>>> notifications can be sent anytime >>>> host_notification_period 24x7 ; host >>>> notifications can be sent anytime >>>> service_notification_options w,u,c,r,f,s ; send >>>> notifications for all service states, flapping events, and scheduled >>>> downtime events >>>> host_notification_options d,u,r,f,s ; send >>>> notifications for all host states, flapping events, and scheduled downtime >>>> events >>>> service_notification_commands notify-service-by-email ; send >>>> service notifications via email >>>> host_notification_commands notify-host-by-email ; send >>>> host notifications via email >>>> register 0 ; DONT >>>> REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE! >>>> } >>>> >>>> As far as command_line definitions that use this one, there aren't any >>>> that I can see. Unless I'm missing something. >>>> >>>> >>>> >>>> ----- Original Message ---- >>>> From: Andy Shellam <[EMAIL PROTECTED]> >>>> To: Grant Lowe <[EMAIL PROTECTED]> >>>> Cc: [EMAIL PROTECTED] >>>> Sent: Tuesday, October 21, 2008 12:18:18 PM >>>> Subject: Re: [Nagiosplug-help] Host monitoring >>>> >>>> Hi Grant, >>>> >>>> OK these notification commands aren't defined for your contact - can you >>>> post the definition of the generic-contact contact template, as well as >>>> the command_line definitions for the notification commands attached to >>>> that command? >>>> >>>> Andy >>>> >>>> Grant Lowe wrote: >>>> >>>> >>>> >>>> >>>>> Hi Andy, >>>>> >>>>> Here's the contact info for me in Nagios: >>>>> >>>>> define contact{ >>>>> contact_name nagiosadmin ; >>>>> Short name of user >>>>> use generic-contact ; Inherit >>>>> default values from generic-contact template (defined above) >>>>> alias Nagios Admin ; Full >>>>> name of user >>>>> >>>>> email [EMAIL PROTECTED] ; <<***** >>>>> CHANGE THIS TO YOUR EMAIL ADDRESS ****** >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ----- Original Message ---- >>>>> From: Andy Shellam <[EMAIL PROTECTED]> >>>>> To: Grant Lowe <[EMAIL PROTECTED]> >>>>> Cc: [EMAIL PROTECTED] >>>>> Sent: Tuesday, October 21, 2008 11:09:56 AM >>>>> Subject: Re: [Nagiosplug-help] Host monitoring >>>>> >>>>> Hi Grant, >>>>> >>>>> What is your definition of the _contact_ glowe? That definition should >>>>> have a service/host notification command attached to it, please send >>>>> those command's command_line definitions. >>>>> >>>>> Thanks, >>>>> >>>>> Andy >>>>> >>>>> Grant Lowe wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Hi Andy, >>>>>> >>>>>> Here's my host definition: >>>>>> >>>>>> define host { >>>>>> host_name myhost >>>>>> alias myhost >>>>>> display_name My Host >>>>>> address 172.20.8.215 >>>>>> hostgroups solaris-servers >>>>>> check_command check-host-alive >>>>>> initial_state o >>>>>> max_check_attempts 5 >>>>>> check_interval 3 >>>>>> retry_interval 3600 >>>>>> active_checks_enabled 0 >>>>>> passive_checks_enabled 1 >>>>>> check_period 24x7 >>>>>> obsess_over_host 0 >>>>>> check_freshness 0 >>>>>> event_handler_enabled 0 >>>>>> flap_detection_enabled 0 >>>>>> flap_detection_options o,d,u >>>>>> process_perf_data 1 >>>>>> retain_status_information 1 >>>>>> retain_nonstatus_information 0 >>>>>> contacts glowe >>>>>> notification_interval 300 >>>>>> notification_period 24x7 >>>>>> notification_options d,u,r,f,s >>>>>> notifications_enabled 1 >>>>>> stalking_options >>>>>> } >>>>>> >>>>>> >>>>>> Here's my service definition: >>>>>> >>>>>> define service{ >>>>>> host_name blarney >>>>>> hostgroup_name solaris-servers >>>>>> service_description Ping >>>>>> check_command check_ping!200.0,20%!600.0,60% >>>>>> max_check_attempts 5 >>>>>> notification_interval 60 >>>>>> check_period 24x7 >>>>>> } >>>>>> >>>>>> Thanks for the help! >>>>>> >>>>>> >>>>>> ----- Original Message ---- >>>>>> From: Andy Shellam <[EMAIL PROTECTED]> >>>>>> To: Grant Lowe <[EMAIL PROTECTED]> >>>>>> Cc: [EMAIL PROTECTED] >>>>>> Sent: Monday, October 20, 2008 1:44:21 PM >>>>>> Subject: Re: [Nagiosplug-help] Host monitoring >>>>>> >>>>>> Hi Grant, >>>>>> >>>>>> Have a look at your contact definition, at the service and host >>>>>> notification commands - look those up in your commands.cfg (or whatever >>>>>> your command file is) and that should point to a command_line that sends >>>>>> the e-mail (using /bin/mail or similar.) It may even be a shell >>>>>> script. Either way, we'd need to see your command definition to try and >>>>>> work out what's going on here. >>>>>> >>>>>> Andy >>>>>> >>>>>> Grant Lowe wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Hi Andy, >>>>>>> >>>>>>> I'm looking at all the command definitions and nothing is in there that >>>>>>> I can see about retaining PROBLEM data. I do have the notifications >>>>>>> set to 60 minutes and that's when I receive the email. But it always >>>>>>> says PROBLEM in the email I receive. Maybe that's the problem? Is >>>>>>> there a way to set it to a different string? Or is that opening up a >>>>>>> can of worms? >>>>>>> >>>>>>> >>>>>>> >>>>>>> ----- Original Message ---- >>>>>>> From: Andy Shellam <[EMAIL PROTECTED]> >>>>>>> To: Grant Lowe <[EMAIL PROTECTED]> >>>>>>> Cc: [EMAIL PROTECTED] >>>>>>> Sent: Monday, October 20, 2008 11:34:05 AM >>>>>>> Subject: Re: [Nagiosplug-help] Host monitoring >>>>>>> >>>>>>> Hi Grant, >>>>>>> >>>>>>> What are your notification options for the host, and your notification >>>>>>> command? It's possible that the host/s in question has gone down and >>>>>>> Nagios is reporting it has returned to an UP state, but your >>>>>>> notification command is hard-coded to say PROBLEM. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> Grant Lowe wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Another question for you all. On some hosts, I keep on getting a >>>>>>>> notification that reads: >>>>>>>> >>>>>>>> ** PROBLEM Host Alert: myserver is UP ** >>>>>>>> >>>>>>>> I'm trying to figure out why Nagios is generating these errors, when >>>>>>>> the host is obviously up. Thanks! >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------- >>>>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's >>>>>>>> challenge >>>>>>>> Build the coolest Linux based applications with Moblin SDK & win great >>>>>>>> prizes >>>>>>>> Grand prize is a trip for two to an Open Source event anywhere in the >>>>>>>> world >>>>>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>>>>> _______________________________________________ >>>>>>>> Nagiosplug-help mailing list >>>>>>>> [EMAIL PROTECTED] >>>>>>>> https://lists.sourceforge.net/lists/listinfo/nagiosplug-help >>>>>>>> ::: Please include plugins version (-v) and OS when reporting any >>>>>>>> issue. >>>>>>>> ::: Messages without supporting info will risk being sent to /dev/null >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >>> >>> >> >> > > > ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null