On Thu, Dec 13, 2001 at 06:02:43AM -0500, Tom Scanlan wrote:
>
> i'm shooting for something along the lines of getting mon to fail over
> routes, restart a dead process, put a "sorry, service is unavailable"
> page on a web server, or to take some other action based on the service
> you are watching.
I have IIS servers that fail regularly, and all they need is to be
restarted. _NT_ALERT_ expands to a script that takes the webserver out
of service in a load balancer, and telnets to the win2k box and restarts
IIS. If it comes back up "inser_iis.alert" puts it back in service in
the load balancer. iis.upalert and iis.alert are just mail.alert with
different text in the message.
If this doesn't fix it automatically, the oncall person gets paged after
3 attempts to fix it, or 4 minutes (first alert, then 2 min, then 2
min). This works quite well and people don't need to be bothered to
do something a script should be doing.
We have people in a NOC who get paid to do what Mon does for free. I'd
feel threatened if I were them ;)
watch server
service http
description take server in/out of service when failed
interval 2m
randskew 10s
monitor http.iis.monitor
allow_empty_group
period P1: _ANYTIME_
alertafter 1
alertevery 5m summary
alert _NT_ALERT_
alert iis.alert ops@work
comp_alerts
upalert inser_iis.alert
upalert iis.upalert ops@work
period P2: _ANYTIME_
alertafter 3
alertevery 72h summary
alert alert.test [EMAIL PROTECTED]
alert mail.alert _OPS_PAGER_
comp_alerts
upalert alert.test [EMAIL PROTECTED]
upalert mail.alert _OPS_PAGER_
--
Nate Campi http://www.campin.net GnuPG key: 0xC17AEF79
Key fingerprint = BF12 722F 8799 E614 33CC FAB7 5A90 C464 C17A EF79
Your mouse has moved. Windows NT must be restarted
for the change to take effect. Reboot now? [ OK ]