--On Wednesday, February 22, 2006 16:46:59 -0600 Nate Reed
<[EMAIL PROTECTED]> wrote:
I'm not sure if I have set the monitoring parameters correctly for what I
want to do.
First question: is the monitoring "interval" the frequency that mon runs
the monitor, or does it define something else?
I hate to quote the documentation, but from the manual:
interval timeval
The keyword interval followed by a time value specifies the frequency
that a monitor script will be triggered.
So 'interval 30s' means that mon will run the monitor test every 30 seconds.
It seems like MON is "forgetting" about the previous alert after the
monitoring interval has elapsed (MON_FIRST_FAILURE and MON_LAST_FAILURE
are equal even though there were numerous failures). Is that what's
supposed to happen?
First and last failure should be the same in certain cases, depending on
how long the failure has been happening. first failure is an indication of
when the current failure started, last failure is an indication of when the
most recent monitor test was run. So if your interval is 5 minutes, for
the five minutes immediately following the first detection of a failure
first and last will be the same.
Ideally, my monitor would run very frequently (every few seconds), but
the monitoring "interval" would be longer, like 30 minutes. Upon on a
second failure during the monitoring interval, my alert script will try
to take a different action than on the first failure. Is this possible
through Mon's configuration (without building this logic in my script)?
You can do this. The interval setting configures the testing behavior, the
alert period definitions configure the alerts (actions) that will occur.
You can have multiple periods with different behaviors for different
failure lengths or different times of day.
For example, look at these two periods:
period first_action: wd{Sun-Sat}
alertafter 1
alert some.alert.script -some -arguments
numalerts 1
period second_action: wd{Sun-Sat}
alertafter 30m
alert some.other.alert.script -some -arguments
alertevery 30m
Those would run some.alert.script immediately whenever a failure occurs,
and some.other.alert.script after the failure has been continous for half
an hour and every half hour after that.
See the manual for full information on all the alert control semantics that
are available.
-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University
_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon