Re: alerts functionality

Jim Trocki Tue, 23 Nov 2004 06:59:32 -0800

On Fri, 19 Nov 2004, Joubin Moshrefzadeh wrote:

> host1 goes down - 1 alert sent
> then host2 goes down - 2 alerts sent
> then host3 goes down - 3 alerts sent
> etc...
> 
> so total alerts sent is 1+2+3...+10?
> 
> is the latter correct? I've only tested it up to two hosts going down 
> consecutively :)


it's correct depending on how you configure mon. this is the default
behavior, but you can change it.

i noticed the man page needed some updating, so i did so and check in the
changes to the cvs tree on the mon-1-0-0pre1 branch. the part which affects
this behavior is the "alertevery" parameters.  here's a summary:


ALERT DECISION LOGIC
       Upon a non-zero or zero exit status, the associated  alert  or  upalert
       program (respectively) is started, pending the following conditions: If
       an alert for a specific service is disabled, do not send an alert.   If
       dep_behavior  is  set  to 'a', and a parent dependency is failing, then
       suppress the alert.  If the alert has previously been acknowledged,  do
       not send the alert, unless it is an upalert.  If an alert is not within
       the specified period, record the failure via syslog(3) and do not  send
       an alert.  If the failure does not fall within a defined period, do not
       send an alert.  No upalerts are sent without corresponding down alerts,
       unless no_comp_alerts is defined in the period section. An upalert will
       only be sent if the previous state is  a  failure.   If  an  alert  was
       already  sent  within  the last alertevery interval and the monitor has
       continued to report a nonzero exit status for a time period  less  than
       that  interval,  do  not  send another alert, unless the summary output
       from the most recent monitor process differs from the previous.  Other-
       wise,  send  an  alert using each alert program listed for that period.
       The observe_detail argument to  alertevery  affects  this  behavior  by
       observing  the  changes in the detail part of the output in addition to
       the summary line.  If a monitor has successive failures and the summary
       output  changes  in each of them, alertevery will not suppress multiple
       consecutive alerts.  The  reasoning  is  that  if  the  summary  output
       changes,  then  a  significant  event  occurred  and the user should be
       alerted.  The "ignore_summary"  option  will  suppress  all  successive
       alerts  while the service continues to fail, even if the summary output
       changes.  If the "strict" alertevery option is used,  then  behave  the
       same  as  if  "ignore_summary" was set, but do not reset the alertevery
       timer when  the  monitor  exits  with  a  zero  status.   For  example,
       "alertevery  24h  strict"  will  only  send  out an alert once every 24
       hours, regardless of whether the monitor output changes, or if the ser-
       vice stops and then starts failing.

...

       alertevery timeval [observe_detail | ignore_summary | strict ]
              The alertevery keyword (within a period  definition)  takes  the
              same  type  of argument as the interval variable, and limits the
              number of times an alert is sent when the service  continues  to
              fail.   For example, if the interval is "1h", then the alerts in
              the period section will only be triggered once every hour as the
              service  continues  to fail.  The alertevery interval timer will
              be reset if the monitor stops exiting with a nonzero exit status
              (i.e. it reports a success).  If the alertevery keyword is omit-
              ted in a period entry, an alert will be sent out  every  time  a
              failure  is  detected.  By default, if the summary output of two
              successive failures changes, then  the  alertevery  interval  is
              overridden,  and  an  alert  will be sent.  The "ignore_summary"
              argument   suppresses   this   behavior.     If    the    string
              "observe_detail" is the last argument, then both the summary and
              detail output lines will be considered when comparing the output
              of  successive  failures.   If  the  string "strict" is the last
              argument, then the output of the monitor or the state change  of
              the  service  will  have no effect on when alerts are sent. That
              is, "alertevery 24h strict" will send only one  alert  every  24
              hours, no matter what.  Please refer to the ALERT DECISION LOGIC
              section for a detailed explanation of how alerts are suppressed.


_______________________________________________
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon

Re: alerts functionality

Reply via email to