[
https://issues.apache.org/jira/browse/EAGLE-464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15422480#comment-15422480
]
Su Ralph commented on EAGLE-464:
--------------------------------
In pr of : https://github.com/apache/incubator-eagle/pull/337. merged draft
implementation
> StateCheck: multiple stage of definition in single policy
> ---------------------------------------------------------
>
> Key: EAGLE-464
> URL: https://issues.apache.org/jira/browse/EAGLE-464
> Project: Eagle
> Issue Type: New Feature
> Affects Versions: v0.5.0
> Reporter: Su Ralph
> Assignee: Su Ralph
> Fix For: v0.5.0
>
>
> The requirement of alert state and transition comes from two real customer
> needs.
> Alert de-duplication
> "IMO, eagle should do state checks for all services. Eagle should not alert
> in the first attempt itself. Instead it should change the state to SOFT for 2
> tries and then if it is the same state, change the state to HARD and then
> send the alert." - Aroop
> Currently, eagle's alert engine(and also that of UMP) use a simple
> deduplication spec of time based redundancy check(dedupIntervalMin of
> Publishment). This deduplication is not flexible to reflect the need of
> alerts. There are common requests like to hold a alert/policy state
> (basically a alert state is policy state on given partition value, more in
> latter), and trigger alert when the state changed. This state change manner
> could be
> > Same alert trigger again in M time interval
> > N alerts in given M time interval.
> NOTE: on here, in this de-duplication mode, there is no required change of
> the policy itself.
> Alert policy define on transition
> One example of the missingblock policy we met(only alert when missingblock
> number changes). There is more general case with minor difference, given a
> metric (or a field of a given stream), define value range, where each range
> indicate different state. Etc. for perfmon.latency.avg.perpool, define value
> range state as
> metric
> value range
> state
> alert trigger
> perfmon.latency.avg.perpool.5min 3000 - Unlimited FATAL always
> (every 5min until FATAL fixed or alert muted explictly)
> 1000 - 3000 CRITICAL on dual transition
> 100 - 1000 WARN on dual transition
> 10 - 50 NORMAL on worse transition
> 0-10 GOOD on worse transition
> Then the alert should be trigger during the state changed expect for FATAL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)