On 8/9/06, Andrew Laden <[EMAIL PROTECTED]> wrote:
One thing to watch is that HOST alerts will get sent out as soon as the host is detected down. You can play with the retry settings. But you generally need to keep those short, as a host check supercedes all other checks, and nagios will essentially pause until it determines status of the host.
 
You can also play with escalations to delay checks. Have no notifications initially, and then use an escalation to send the alert later. This takes a little work to get right.
 


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Aaron Segura
Sent: Wednesday, August 09, 2006 12:09 PM
Subject: Re: [Nagios-users] controlling notifications a bit better

Normal check interval: 5 min

Retry Check interval  : 5 min

Max check attempts :  2

 

-or-

 

Normal check interval: 2 min

Retry check interval:   1 min

Max check attempts:  9

 

-or- 

 

(This is the one I run on some services)

Normal check interval: 5 min

Retry check interval  : 1 min

Max check attempts: 6

 

 

Something along those lines should do it…Yay for math!

 


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of Gavin Cato
Sent: Wednesday, August 09, 2006 12:47 AM
To: nagios-users@lists.sourceforge.net
Subject: [Nagios-users] controlling notifications a bit better

 

Hi,

 

I want certain hosts/services to only send an email alert if the host/service is down for 10 minutes.

 

I've tried playing with max_check_attempts and the other obvious parameters but I still get email alerts after only 1-2mins.

 

Can anyone please show me a sample config snippet or how they do it?

 

Cheers

 

Gav

 

 


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642

_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null


As Andrew stated above, it is a bad idea to set max attempts to high numbers because the host checks are not run in parallel.  Doing so will cause your Nagios to have scheduling problems.  Instead let Nagios try to send out the notification immediately and set up a script to intercept and trash this first notification.  Then reschedule the next notification for 10 minutes down the road.

Here is how:

# 'host-notify-by-email' command definition
define command{
        command_name    host-notify-by-email
        command_line    $USER1$/eventhandlers/check_notification $NOTIFICATIONNUMBER$ $NOTIFICATIONTYPE$ '/usr/bin/printf "%b" "$HOSTSTATE$ - $HOSTALIAS$\nDuration: $HOSTDURATION$\nDate: $LONGDATETIME$\nHost: $HOSTNAME$\nAddress: $HOSTADDRESS$ $NOTIFICATIONNUMBER$" | /usr/bin/mailx -s "$NOTIFICATIONTYPE$:$HOSTALIAS$/$HOSTSTATE$" $CONTACTEMAIL$'
        }

Notice how I added the notification number macro in the above command.

Now, create the check_notification script that it calls:

#!/bin/sh
if [ "$1" = 1 ] ; then
  if [ "$2" = PROBLEM ] ; then
    exit 0
  fi
elif [ "$1" = 2 ] ; then
  if [ "$2" = RECOVERY ] ; then
    exit 0
  fi
fi
sh -c "$3"


What the above does it basically throws away the first notification (which occurs immediately after a host goes down). The setup might seem a little strange, but this method allows you to keep your notification message options inside the Nagios config file.

Now because the first notification is thrown away, we need to have it schedule another notification for 10 minutes later:


Do this by adding an event_handler to the host definition:
event_handler           ignore_first_hostpage

Define this eventhandler:
define command{
        command_name    ignore_first_hostpage
        command_line    $USER1$/eventhandlers/host_notification $HOSTSTATE$ $HOSTSTATETYPE$ $HOSTNAME$
        }


Now create the host_notification script which is called in the above command:
#!/bin/sh
# This is a sample shell script showing how you can submit the DELAY_HOST_NOTIFICATION command
# to Nagios.  Adjust variables to fit your environment as necessary.

# Only take action on hard host states...
case "$2" in
HARD)

        case "$1" in
        DOWN)
                # The host has gone down!
                now=`/usr/bin/perl -e 'printf "%d\n", time;'`
                newpagetime=`expr $now + 600`
                commandfile='/opt/FONnagios/var/rw/nagios.cmd'
                commandline="[$now] DELAY_HOST_NOTIFICATION;$3;$newpagetime"
                commandline2="[$now] SCHEDULE_HOST_CHECK;$3;$newpagetime"
                echo $commandline >> $commandfile
                echo $commandline2 >> $commandfile
                ;;
        esac
        ;;
esac
exit 0


In the above script, it is important to have the host check scheduled after the delay notification command because the notification will not occur until after the next check fails.  If the next check does not fail, and the host recovers, you will receive no notifications.

Mike




-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to