--On Wednesday, February 22, 2006 16:46:59 -0600 Nate Reed <[EMAIL PROTECTED]> wrote:

I'm not sure if I have set the monitoring parameters correctly for what I
want  to do.

First question: is the monitoring "interval" the frequency that mon runs
the  monitor, or does it define something else?


I hate to quote the documentation, but from the manual:
interval timeval
The keyword interval followed by a time value specifies the frequency that a monitor script will be triggered.

So 'interval 30s' means that mon will run the monitor test every 30 seconds.

It seems like MON is "forgetting" about the previous alert after the
monitoring interval has elapsed (MON_FIRST_FAILURE and MON_LAST_FAILURE
are  equal even though there were numerous failures).  Is that what's
supposed to  happen?


First and last failure should be the same in certain cases, depending on how long the failure has been happening. first failure is an indication of when the current failure started, last failure is an indication of when the most recent monitor test was run. So if your interval is 5 minutes, for the five minutes immediately following the first detection of a failure first and last will be the same.


Ideally, my monitor would run very frequently (every few seconds), but
the  monitoring "interval" would be longer, like 30 minutes.  Upon on a
second  failure during the monitoring interval, my alert script will try
to take a  different action than on the first failure.  Is this possible
through Mon's  configuration (without building this logic in my script)?


You can do this. The interval setting configures the testing behavior, the alert period definitions configure the alerts (actions) that will occur. You can have multiple periods with different behaviors for different failure lengths or different times of day.

For example, look at these two periods:

period first_action: wd{Sun-Sat}
 alertafter 1
 alert some.alert.script -some -arguments
 numalerts 1
period second_action: wd{Sun-Sat}
 alertafter 30m
 alert some.other.alert.script -some -arguments
 alertevery 30m


Those would run some.alert.script immediately whenever a failure occurs, and some.other.alert.script after the failure has been continous for half an hour and every half hour after that.


See the manual for full information on all the alert control semantics that are available.

-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to