Su Ralph created EAGLE-464:
------------------------------

             Summary: StateCheck: multiple stage of definition in single policy
                 Key: EAGLE-464
                 URL: https://issues.apache.org/jira/browse/EAGLE-464
             Project: Eagle
          Issue Type: New Feature
    Affects Versions: v0.5.0
            Reporter: Su Ralph
            Assignee: Su Ralph
             Fix For: v0.5.0


The requirement of alert state and transition comes from two real customer 
needs.
Alert de-duplication
"IMO, eagle should do state checks for all services. Eagle should not alert in 
the first attempt itself. Instead it should change the state to SOFT for 2 
tries and then if it is the same state, change the state to HARD and then send 
the alert." - Aroop
Currently, eagle's alert engine(and also that of UMP) use a simple 
deduplication spec of time based redundancy check(dedupIntervalMin of 
Publishment). This deduplication is not flexible to reflect the need of alerts. 
There are common requests like to hold a alert/policy state (basically a alert 
state is policy state on given partition value, more in latter), and trigger 
alert when the state changed. This state change manner could be 
> Same alert trigger again in M time interval
> N alerts in given M time interval.
NOTE: on here, in this de-duplication mode, there is no required change of the 
policy itself.
Alert policy define on transition
One example of the missingblock policy we met(only alert when missingblock 
number changes). There is more general case with minor difference, given a 
metric (or a field of a given stream), define value range, where each range 
indicate different state. Etc. for perfmon.latency.avg.perpool, define value 
range state as
metric
value range
state
alert trigger
perfmon.latency.avg.perpool.5min        3000 - Unlimited        FATAL   always 
(every 5min until FATAL fixed or alert muted explictly)
        1000 - 3000     CRITICAL        on dual transition
        100 - 1000      WARN    on dual transition
        10 - 50 NORMAL  on worse transition
        0-10    GOOD    on worse transition
Then the alert should be trigger during the state changed expect for FATAL.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to