fyi..
While trap timeouts appear to escalate/invoke alerts now, normal trap
failures ($opstatus='fail') still do not.
Tom Scanlan wrote:
>
>this patch fixes "alertafter" and "numalerts" for "traptimeouts". it is
>in reply to the two mails at the bottom.
>
>the two changes haven't seemed to break anything else, but just in case
>here are the two changes in english:
>
>1. in "&handle_trap_timeout", $sref->{"_consec_failures"}++ gets the
>"alertafter NUM" to work .
>2. "&call_alert" doesn't send the alert if we pass it "undef" $output or
>$retval, so i substituted reasonable values.
>
>
>now the following woks, where before no alert would be sent if the
>heartbeat stopped.
>
>watch remote-group
> service heartbeat
> traptimeout 10s
> period wd {Sun-Sat}
> alert test.alert tscanlan
> upalert test.alert -u tscanlan
> alertafter 2
> numalerts 3
>
>
>-Tom Scanlan
>OpenReach, Inc.
>Network Operations
>office: 732-254-0210 x-6022
>cell: 732-682-3365
>
>----
>RFP:
>-----------------------------------------------------------------------------
>
>
>Date: Tue, 13 Nov 2001 14:54:22 +0100
>From: "Peter Wirdemo (EMW)" <[EMAIL PROTECTED]>
>To: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
>Subject: trap timeout alerts
>
>Hello!
>
>I'm trying to use mon, to do a heartbeat style monitoring.
>
>Why dont i get any alerts when the trap is timed out.
>In the mon.cgi i get:
>Host Group | Service
>------------------------------------
>syslog | hearbeat : trap timeout
> | (FAILED,NOALERTS)
>
>NOALERTS??????
>Mon Version:
>$Id: mon 1.27 Sat, 08 Sep 2001 09:42:05 -0400 trockij $
>$ProjectVersion: mon-0-99-2.6 $
>
>Config:
>
>watch syslog
> service heartbeat
> description heartbeat test
> traptimeout 30s
> trapduration 1s
> period wd {Sun-Sat}
> alertevery 1h
> no_comp_alerts
> alert mail.alert me@localhost
> upalert mail.alert -u me@localhost
>
>
>Thanks
>
>/Peter
>
>
>-----------------------------------------------------------------------------
>
>
>Date: Wed, 30 Jan 2002 12:53:46 -0500
>From: [EMAIL PROTECTED]
>To: [EMAIL PROTECTED]
>Subject: alertevery does not work with traps
>
>I'm having problems getting the alertevery variable to work with traps.
>I've seen in this mailing list where others have reported that consecutive
>failures do not appear to get incremented withing the trap handling sub
>routine (have not yet looked at code myself). However I have not seen any
>mention of alertevery not working in this scenario. The alertafter XXm
>variable seems to work fine, however people are getting paged every time a
>failure occurs and I desperately need to throttle this back.
>
>Relevant portion of my config....
>
>watch trap-webchat
> service webchat-useragent
> period FIRSTLEVEL: wd {Sun-Sat}
> alert audible.alert
> alertafter 6m
> period SECONDLEVEL: wd {Sun-Sat}
> alert bcmail.alert analyst
> alertafter 15m
> alertevery 10m
> period THIRDLEVEL: wd {Sun-Sat}
> alert bcmail.alert expert
> alertafter 30m
> alertevery 10m
> period CRISIS: wd {Sun-Sat}
> alert bcmail.alert crisis_team
> alertafter 30m
> numalerts 1
> period FOURTHLEVEL: wd {Sun-Sat}
> alert bcmail.alert management
> alertafter 50m
> alertevery 10m
>
>
>Has anyone successfully gotten traps/alertevery working?
>
>
>(See attached file: fix-mon-traps.patch)
>