[influxdb] Re: kapacitor sending ok alerts as state change on start up

Archie Archbold Mon, 26 Jun 2017 10:19:28 -0700

great. I'll give it a try and post back results.


On Monday, June 26, 2017 at 8:15:59 AM UTC-7, [email protected] wrote:
>
> In case anyone else has this problem, the solution involves setting the 
> alert ID to something unique. 
>
> AFAIK, this isn't documented very well, but each alert has an associated 
> identifier which is used internally to track the alert state. If you use 
> groupBy to split out a metric stream into multiple alerts, but use the same 
> ID, then a change in the alert condition for any metric stream will cause 
> the alert to change state. 
>
> So for example, in the case of disk usage, you would do something like: 
>
>     |groupBy('host', 'path') 
>     |alert() 
>         .id('{{ .Tags.host }}/{{ .Tags.path }}/disk_usage') 
>
>
>
> On Monday, June 12, 2017 at 6:51:55 PM UTC-4, [email protected] wrote: 
> > From some limited testing, it seems like the problem is that if a 
> particular host (say, 'host1') has a WARN/CRITICAL for a particular (host1, 
> device1) grouping, as well as an OK for a (host1, device2) grouping, that 
> alerts will be generated for both device1 and device2, even though only 
> device1 is in an alert state. 
> > 
> > I've tested this hypothesis on a host that has no groupings in an alert 
> state, and one with just a single grouping in an alert state. The host with 
> no groupings in alert state receives no alerts. 
> > 
> > Can anyone make sense of this? 
> > 
> > I'm using Kapacitor 1.3.1 BTW 
> > 
> > On Monday, June 12, 2017 at 12:17:25 PM UTC-4, [email protected] wrote: 
> > > Any updates on this? 
> > > 
> > > We're having this same problem. Restart Kapacitor or re-define the 
> task, and we get spammed alerts saying everything is OK (even from hosts 
> which never entered a non-OK state). 
> > > 
> > > Our TICK is pretty simple (and very similar to OP): 
> > > 
> > > stream 
> > >     |from() 
> > >         .database('telegraf') 
> > >         .measurement('disk') 
> > >         .groupBy('host', 'device') 
> > >     |alert() 
> > >         .warn(lambda: "used_percent" >= 80) 
> > >         .warnReset(lambda: "used_percent" < 80) 
> > >         .crit(lambda: "used_percent" >= 90) 
> > >         .critReset(lambda: "used_percent" < 90) 
> > >         .stateChangesOnly() 
> > > 
> > > 
> > > 
> > > On Wednesday, February 22, 2017 at 7:14:23 PM UTC-5, Archie Archbold 
> wrote: 
> > > > Interestingly enough, when I add the .noRecoveries() property to the 
> alert node I only get one DOWN alert even though there are 7 servers that 
> are within the alert range  
> > > > 
> > > > On Wednesday, February 22, 2017 at 11:10:09 AM UTC-8, 
> [email protected] wrote: 
> > > > If you want to ignore the OK alerts use the `.noRecoveries` property 
> of the alert node. This will suppress the OK alerts. 
> > > > 
> > > > On Friday, February 17, 2017 at 3:33:16 PM UTC-7, Archie Archbold 
> wrote: 
> > > > Hey all. Pretty new to TICK but I have a problem that I can't wrap 
> my head around. 
> > > > 
> > > > 
> > > > I am monitoring multiple servers all sending data to one influxdb 
> database and using the 'host' tag to separate the servers in the DB 
> > > > 
> > > > 
> > > > My 'disk' measurement  is taking in mulitiple disk paths from the 
> servers (HOSTS) which each have a respective 'PATH' tag. 
> > > > 
> > > > 
> > > > So basically each server is assigned a HOST tag and each HOST has 
> multiple PATH tags. 
> > > > 
> > > > 
> > > > EXPECTED FUNCTIONALITY: kapacitor should alert upon state change of 
> a HOST's PATH if that path is within the alerting Lambda.  
> > > > PROBLEM: When I start the kapacitor service, it looks like it's 
> sensing a state change any time it sees another host/path with a opposite 
> status. 
> > > > 
> > > > 
> > > > This is a simplified example of the alerts I am getting: 
> > > > 
> > > > 
> > > > Host: host1  Path: /path1  Status: UP 
> > > > Host: host1  Path: /path2  Status: DOWN 
> > > > Host: host1  Path: /path3  Status: UP 
> > > > Host: host2  Path: /path1 Status: DOWN 
> > > > Host: host2  Path: /path2  Status: UP 
> > > > 
> > > > 
> > > > 
> > > > These alerts happen once for each host/path combination and then the 
> service performs as expected, alerting properly when lambda is achieved. 
> > > > 
> > > > 
> > > > The result of this is that I receive a slew of up/down alerts every 
> time I restart the kapacitor service 
> > > > 
> > > > 
> > > > Here is my current tick: 
> > > > 
> > > > 
> > > > 
> > > > var data = stream 
> > > >     |from() 
> > > >         .measurement('disk') 
> > > >         .groupBy('host','path')        
> > > >     |alert() 
> > > >         .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ 
> index .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}') 
> > > >         .warn(lambda: "used_percent" >= 80) 
> > > >               .id('DISK SPACE WARNING') 
> > > >         .email($DISK_WARN_GRP) 
> > > > And the corresponding DOT 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > ID: disk_alert_warn 
> > > > 
> > > > Error:  
> > > > 
> > > > Template:  
> > > > 
> > > > Type: stream 
> > > > 
> > > > Status: enabled 
> > > > 
> > > > Executing: true 
> > > > 
> > > > Created: 17 Feb 17 22:27 UTC 
> > > > 
> > > > Modified: 17 Feb 17 22:27 UTC 
> > > > 
> > > > LastEnabled: 17 Feb 17 22:27 UTC 
> > > > 
> > > > Databases Retention Policies: ["main"."autogen"] 
> > > > 
> > > > TICKscript: 
> > > > 
> > > > var data = stream 
> > > > 
> > > >     |from() 
> > > > 
> > > >         .measurement('disk') 
> > > > 
> > > >         .groupBy('host', 'path') 
> > > > 
> > > >     |alert() 
> > > > 
> > > >         .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ 
> index .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}') 
> > > > 
> > > >         .warn(lambda: "used_percent" >= 80) 
> > > > 
> > > >         .id('DISK SPACE WARNING') 
> > > > 
> > > >         .email() 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > DOT: 
> > > > 
> > > > digraph disk_alert_warn { 
> > > > 
> > > > graph [throughput="38.00 points/s"]; 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > stream0 [avg_exec_time_ns="0s" ]; 
> > > > 
> > > > stream0 -> from1 [processed="284"]; 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > from1 [avg_exec_time_ns="3.9µs" ]; 
> > > > 
> > > > from1 -> alert2 [processed="284"]; 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > alert2 [alerts_triggered="14" avg_exec_time_ns="72.33µs" 
> crits_triggered="0" infos_triggered="0" oks_triggered="7" 
> warns_triggered="7" ]; 
> > > > 
> > > > } 
> > > > As you can see, I get 7 oks triggered (for host/path groups that are 
> not in alert range) and 7 warns triggered (for the 7 host/path groups that 
> are within the alert range) upon start up. 
> > > > Then it behaves as normal. 
> > > > 
> > > > 
> > > > I understand that it should be alerting for the 7 host/path groups 
> that are over 80 but why follow it with an alert about the ok groups? 
> > > > 
> > > > 
> > > > MORE INFO: When I raise the lambda to 90% (out of range for all 
> host/paths) I get no alerts at all (which is expected) 
> > > > 
> > > > 
> > > > Thanks to anyone who can help me understand this 
>
>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/cc8fc204-406d-41ae-aba2-f27adc8c97aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: kapacitor sending ok alerts as state change on start up

Reply via email to