patch handle_trap_timeout: Fixing mon traps

Tom Scanlan Mon, 25 Feb 2002 14:33:31 -0800

this patch fixes "alertafter" and "numalerts" for "traptimeouts".  it is
in reply to the two mails at the bottom.


the two changes haven't seemed to break anything else, but just in case
here are the two changes in english:

1. in "&handle_trap_timeout", $sref->{"_consec_failures"}++ gets the
"alertafter NUM" to work .
2. "&call_alert" doesn't send the alert if we pass it "undef" $output or
$retval, so i substituted reasonable values.


now the following woks, where before no alert would be sent if the
heartbeat stopped.

watch remote-group
    service heartbeat
        traptimeout 10s
        period wd {Sun-Sat}
            alert test.alert tscanlan
            upalert test.alert -u tscanlan
            alertafter 2
            numalerts 3


-Tom Scanlan
OpenReach, Inc.
Network Operations
office: 732-254-0210 x-6022
cell: 732-682-3365

----
RFP:
-----------------------------------------------------------------------------

Date: Tue, 13 Nov 2001 14:54:22 +0100
From: "Peter Wirdemo (EMW)" <[EMAIL PROTECTED]>
To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
Subject: trap timeout alerts

Hello!

I'm trying to use mon, to do a heartbeat style monitoring.

Why dont i get any alerts when the trap is timed out.
In the mon.cgi i get:
Host Group      | Service
------------------------------------
syslog  | hearbeat : trap timeout
                | (FAILED,NOALERTS)

NOALERTS??????
Mon Version:
$Id: mon 1.27 Sat, 08 Sep 2001 09:42:05 -0400 trockij $
$ProjectVersion: mon-0-99-2.6 $

Config:

watch syslog
        service heartbeat
                description heartbeat test
                traptimeout 30s
                trapduration 1s
                period wd {Sun-Sat}
                        alertevery 1h
                        no_comp_alerts
                        alert mail.alert me@localhost
                        upalert mail.alert -u me@localhost


Thanks

/Peter


-----------------------------------------------------------------------------

Date: Wed, 30 Jan 2002 12:53:46 -0500
From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: alertevery does not work with traps

I'm having problems getting the alertevery variable to work with traps.
I've seen in this mailing list where others have reported that consecutive
failures do not appear to get incremented withing the trap handling sub
routine (have not yet looked at code myself).  However I have not seen any
mention of alertevery not working in this scenario.  The alertafter XXm
variable seems to work fine, however people are getting paged every time a
failure occurs and I desperately need to throttle this back.

Relevant portion of my config....

watch trap-webchat
    service webchat-useragent
        period FIRSTLEVEL: wd {Sun-Sat}
            alert audible.alert
            alertafter 6m
        period SECONDLEVEL: wd {Sun-Sat}
            alert bcmail.alert analyst
            alertafter 15m
            alertevery 10m
        period THIRDLEVEL: wd {Sun-Sat}
            alert bcmail.alert expert
            alertafter 30m
            alertevery 10m
        period CRISIS: wd {Sun-Sat}
            alert bcmail.alert crisis_team
            alertafter 30m
            numalerts 1
        period FOURTHLEVEL: wd {Sun-Sat}
            alert bcmail.alert management
            alertafter 50m
            alertevery 10m


Has anyone successfully gotten traps/alertevery working?

--- mon Mon Feb 25 17:03:21 2002
+++ mon.tom     Mon Feb 25 17:15:34 2002
@@ -3975,6 +3975,7 @@
 
     my $sref = \%{$watch{$group}->{$service}};
     $sref->{"_failure_count"}++;
+    $sref->{"_consec_failures"}++;
     $sref->{"_last_failure"} = $tmnow;
     $sref->{"_first_failure"} = $tmnow if ($sref->{"_op_status"} != $STAT_FAIL);
     set_op_status ($group, $service, $STAT_FAIL);
@@ -3984,7 +3985,7 @@
     push @last_failures, "$group $service $tm $sref->{_last_summary}";
     syslog ('crit', "failure for $last_failures[-1]");
 
-    do_alert ($group, $service, undef, undef, $FL_TRAPTIMEOUT);
+    do_alert ($group, $service, "NO OUTPUT", 1, $FL_TRAPTIMEOUT);
 }

patch handle_trap_timeout: Fixing mon traps

Reply via email to