Hello, I seem to be having a rather difficult time getting notifications to work the way that I would like. I'm using a test host to have Nagios generate alerts when I take it offline. Nagios detects and sends notifications for the host coming back on-line very quickly. What I'm doing in order to test my config is to then simulate that the host fails again in a few minutes. The problem I'm seeing is that it takes Nagios 15-20 minutes to send a notification that the host is again down. This would be useless to me in a production environment; if the host drops again -- I need to know about it immediately.
OK, I've been through the docs and have checked everything that seems to make sense in order to figure out this issue -- with no success. I'm running Nagios 3.0.2. Please see some output I've included below to see the time lag between when Nagios notices the host is down again and when it sends the notification. Thanks! -Matt ------------------------------------------------------------------------------ Here's my notification entries while testing: ------------------------------------------------------------------------------ Host Service Type Time Contact Notification Command Information TEST-01 N/A HOST UP 10-29-2008 17:04:05 NOC notify-host-by-email PING OK - Packet loss = 73%, RTA = 0.70 ms TEST-01 N/A HOST UP 10-29-2008 17:04:05 NOC notify-host-by-pager PING OK - Packet loss = 73%, RTA = 0.70 ms TEST-01 N/A HOST UP 10-29-2008 17:04:05 support notify-host-by-email PING OK - Packet loss = 73%, RTA = 0.70 ms TEST-01 N/A HOST UP 10-29-2008 17:04:05 support notify-host-by-pager PING OK - Packet loss = 73%, RTA = 0.70 ms TEST-01 N/A HOST DOWN 10-29-2008 17:03:25 NOC notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 17:03:25 NOC notify-host-by-pager (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 17:03:25 support notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 17:03:25 support notify-host-by-pager (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:58:05 NOC notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:58:05 NOC notify-host-by-pager (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:58:05 support notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:58:05 support notify-host-by-pager (Host Check Timed Out) TEST-01 N/A HOST UP 10-29-2008 16:40:10 NOC notify-host-by-email PING OK - Packet loss = 0%, RTA = 4.47 ms TEST-01 N/A HOST UP 10-29-2008 16:40:10 NOC notify-host-by-pager PING OK - Packet loss = 0%, RTA = 4.47 ms TEST-01 N/A HOST UP 10-29-2008 16:40:10 support notify-host-by-email PING OK - Packet loss = 0%, RTA = 4.47 ms TEST-01 N/A HOST UP 10-29-2008 16:40:10 support notify-host-by-pager PING OK - Packet loss = 0%, RTA = 4.47 ms TEST-01 N/A HOST DOWN 10-29-2008 16:27:40 NOC notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:27:40 NOC notify-host-by-pager (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:27:40 support notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:27:40 support notify-host-by-pager (Host Check Timed Out) TEST-01 N/A HOST UP 10-29-2008 16:10:20 NOC notify-host-by-email PING OK - Packet loss = 0%, RTA = 0.45 ms TEST-01 N/A HOST UP 10-29-2008 16:10:20 NOC notify-host-by-pager PING OK - Packet loss = 0%, RTA = 0.45 ms TEST-01 N/A HOST UP 10-29-2008 16:10:20 support notify-host-by-email PING OK - Packet loss = 0%, RTA = 0.45 ms TEST-01 N/A HOST UP 10-29-2008 16:10:20 support notify-host-by-pager PING OK - Packet loss = 0%, RTA = 0.45 ms TEST-01 N/A HOST DOWN 10-29-2008 16:05:48 NOC notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:05:48 NOC notify-host-by-pager (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:05:48 support notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 16:05:48 support notify-host-by-pager (Host Check Timed Out) TEST-01 N/A HOST UP 10-29-2008 15:44:19 NOC notify-host-by-email PING OK - Packet loss = 0%, RTA = 0.53 ms TEST-01 N/A HOST UP 10-29-2008 15:44:19 support notify-host-by-email PING OK - Packet loss = 0%, RTA = 0.53 ms TEST-01 N/A HOST DOWN 10-29-2008 15:21:09 NOC notify-host-by-email (Host Check Timed Out) TEST-01 N/A HOST DOWN 10-29-2008 15:21:09 support notify-host-by-email (Host Check Timed Out) And here's the host's history: ------------------------------------------------------------------------------ October 29, 2008 17:00 Program Start[10-29-2008 17:07:34] Nagios 3.0.2 starting... (PID=1917) Program Restart[10-29-2008 17:07:34] Caught SIGHUP, restarting... Service Ok[10-29-2008 17:04:25] SERVICE ALERT: TEST-01;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.44 ms Service Ok[10-29-2008 17:04:15] SERVICE ALERT: TEST-01;TFTP Server;OK;HARD;1;TCP OK - 0.007 second response time on port 8099 Service Ok[10-29-2008 17:04:15] SERVICE ALERT: TEST-01;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.022 seconds Host Up[10-29-2008 17:04:05] HOST ALERT: TEST-01;UP;HARD;1;PING OK - Packet loss = 73%, RTA = 0.70 ms October 29, 2008 16:00 Host Down[10-29-2008 16:58:05] HOST ALERT: TEST-01;DOWN;HARD;10;(Host Check Timed Out) Host Down[10-29-2008 16:56:25] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host Check Timed Out) Host Down[10-29-2008 16:54:55] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host Check Timed Out) Host Down[10-29-2008 16:53:15] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host Check Timed Out) Host Down[10-29-2008 16:51:35] HOST ALERT: TEST-01;DOWN;SOFT;6;(Host Check Timed Out) Host Down[10-29-2008 16:49:55] HOST ALERT: TEST-01;DOWN;SOFT;5;(Host Check Timed Out) Host Down[10-29-2008 16:48:15] HOST ALERT: TEST-01;DOWN;SOFT;4;(Host Check Timed Out) Host Down[10-29-2008 16:46:45] HOST ALERT: TEST-01;DOWN;SOFT;3;(Host Check Timed Out) Host Down[10-29-2008 16:45:35] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host Check Timed Out) Host Down[10-29-2008 16:45:05] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host Check Timed Out) Program Start[10-29-2008 16:44:55] Nagios 3.0.2 starting... (PID=1917) Program Restart[10-29-2008 16:44:55] Caught SIGHUP, restarting... Host Down[10-29-2008 16:43:30] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host Check Timed Out) Service Critical[10-29-2008 16:42:30] SERVICE ALERT: TEST-01;PING;CRITICAL;HARD;1;PING CRITICAL - Packet loss = 100% Service Critical[10-29-2008 16:42:20] SERVICE ALERT: TEST-01;TFTP Server;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds Service Critical[10-29-2008 16:42:20] SERVICE ALERT: TEST-01;HTTP;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds Host Down[10-29-2008 16:41:50] HOST ALERT: TEST-01;DOWN;SOFT;1;(Host Check Timed Out) Service Critical[10-29-2008 16:41:30] SERVICE ALERT: TEST-01;PING;CRITICAL;SOFT;1;PING CRITICAL - Packet loss = 100% Service Critical[10-29-2008 16:41:20] SERVICE ALERT: TEST-01;TFTP Server;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds Service Critical[10-29-2008 16:41:20] SERVICE ALERT: TEST-01;HTTP;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds Service Ok[10-29-2008 16:40:20] SERVICE ALERT: TEST-01;PING;OK;SOFT;1;PING OK - Packet loss = 0%, RTA = 0.56 ms Service Ok[10-29-2008 16:40:10] SERVICE ALERT: TEST-01;TFTP Server;OK;SOFT;1;TCP OK - 0.047 second response time on port 8099 Service Ok[10-29-2008 16:40:10] SERVICE ALERT: TEST-01;HTTP;OK;SOFT;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.034 seconds Host Up[10-29-2008 16:40:10] HOST ALERT: TEST-01;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 4.47 ms Program Start[10-29-2008 16:39:40] Nagios 3.0.2 starting... (PID=1917) Program Restart[10-29-2008 16:39:40] Caught SIGHUP, restarting... Host Down[10-29-2008 16:27:40] HOST ALERT: TEST-01;DOWN;HARD;10;(Host Check Timed Out) Host Down[10-29-2008 16:26:00] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host Check Timed Out) Host Down[10-29-2008 16:24:20] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host Check Timed Out) Host Down[10-29-2008 16:22:40] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host Check Timed Out) Host Down[10-29-2008 16:21:10] HOST ALERT: TEST-01;DOWN;SOFT;6;(Host Check Timed Out) Host Down[10-29-2008 16:19:30] HOST ALERT: TEST-01;DOWN;SOFT;5;(Host Check Timed Out) Host Down[10-29-2008 16:18:00] HOST ALERT: TEST-01;DOWN;SOFT;4;(Host Check Timed Out) Host Down[10-29-2008 16:16:20] HOST ALERT: TEST-01;DOWN;SOFT;3;(Host Check Timed Out) Host Down[10-29-2008 16:14:40] HOST ALERT: TEST-01;DOWN;SOFT;2;(Host Check Timed Out) Service Critical[10-29-2008 16:13:30] SERVICE ALERT: TEST-01;PING;CRITICAL;HARD;1;PING CRITICAL - Packet loss = 100% Service Critical[10-29-2008 16:13:20] SERVICE ALERT: TEST-01;TFTP Server;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds Service Critical[10-29-2008 16:13:20] SERVICE ALERT: TEST-01;HTTP;CRITICAL;HARD;1;CRITICAL - Socket timeout after 10 seconds Host Down[10-29-2008 16:13:10] HOST ALERT: TEST-01;DOWN;SOFT;1;(Host Check Timed Out) Program Start[10-29-2008 16:12:10] Nagios 3.0.2 starting... (PID=1917) Program Restart[10-29-2008 16:12:10] Caught SIGHUP, restarting... Service Ok[10-29-2008 16:10:20] SERVICE ALERT: TEST-01;PING;OK;HARD;1;PING OK - Packet loss = 0%, RTA = 0.43 ms Host Up[10-29-2008 16:10:20] HOST ALERT: TEST-01;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.45 ms Service Ok[10-29-2008 16:10:10] SERVICE ALERT: TEST-01;TFTP Server;OK;HARD;1;TCP OK - 0.005 second response time on port 8099 Service Ok[10-29-2008 16:10:10] SERVICE ALERT: TEST-01;HTTP;OK;HARD;1;HTTP OK HTTP/1.1 200 OK - 403 bytes in 0.013 seconds Program Start[10-29-2008 16:09:10] Nagios 3.0.2 starting... (PID=1917) Program Restart[10-29-2008 16:09:10] Caught SIGHUP, restarting... Host Down[10-29-2008 16:05:48] HOST ALERT: TEST-01;DOWN;HARD;10;(Host Check Timed Out) Host Down[10-29-2008 16:04:18] HOST ALERT: TEST-01;DOWN;SOFT;9;(Host Check Timed Out) Host Down[10-29-2008 16:02:38] HOST ALERT: TEST-01;DOWN;SOFT;8;(Host Check Timed Out) Host Down[10-29-2008 16:00:58] HOST ALERT: TEST-01;DOWN;SOFT;7;(Host Check Timed Out) ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null