My problem is something different: I don't have to do an heartbeat style monitoring,
but simply to mark the begin and the end of a task, which I know it takes a certain
amount of time to complete.
In the case that it takes too long to complete I want to be notified.
The real case is that: I want to monitor the backup of a server.
The things should happen in this way:
1) begin backup: a perl script sends a trap to indicate that the backup is started
2) the main backup script does his work
3) end backup: at the end of the main backup script this is called to indicate that
the backup is finished.
A trap is sent to the mon server, with appropriate return code and a summary of the
backup log.
Well, the problem is: if the backup is taking too long to complete, I want to be
notified.
I tried to do this in that way:
--
watch Backup
service bkServerA
description Backup serverA
period
alert mail.alert [EMAIL PROTECTED]
upalert mail.alert -u [EMAIL PROTECTED]
alertafter 3h
alertevery 24h
--
but the alert is never sent, so neither the upalert.
I tried all the patch I seen on the mailing list.
Here is the code I use to notify the begin of the backup:
--
$c = new Mon::Client (
host => "monserver",
port => monport,
username => "montrap",
password => "montrap",
);
$c->send_trap (
group => "Backup",
service => "bkServerA",
retval => 2,
opstatus => "fail",
summary => "Backup started",
detail => ""
);
--
Any idea?
Roberto T.
-----Messaggio originale-----
Da: Ed Ravin [mailto:[EMAIL PROTECTED]]
Inviato: marted� 19 marzo 2002 18.25
A: TORRESANI, Roberto
Cc: [EMAIL PROTECTED]
Oggetto: Re: traptimeout
TORRESANI, Roberto writes:
>
> I'm trying to do a thing like :
> - do something and let me know when you have finished (with a mon trap)
> - if mon doesn't get the trap in a reasonable period of time,
> the service is considered failed.
Check the mon man page for "traptimeout" and "trapduration".
That will let you do what you want. Here's a snippet from my config:
watch trapthing
service whereareyou
description go red if we don't hear from you
traptimeout 5m
trapduration 1s
The above service will go into failure mode if a trap is not
received in 5 minutes (traptimeout). After the trap is received,
the service will be marked "OK" after 1 second (trapduration).