That worked perfectly. In looking at the documentation again, your explanation makes total sense. Thank you so much for your time and help.
On Monday, June 26, 2017 at 8:15:59 AM UTC-7, [email protected] wrote: > > In case anyone else has this problem, the solution involves setting the > alert ID to something unique. > > AFAIK, this isn't documented very well, but each alert has an associated > identifier which is used internally to track the alert state. If you use > groupBy to split out a metric stream into multiple alerts, but use the same > ID, then a change in the alert condition for any metric stream will cause > the alert to change state. > > So for example, in the case of disk usage, you would do something like: > > |groupBy('host', 'path') > |alert() > .id('{{ .Tags.host }}/{{ .Tags.path }}/disk_usage') > > > > On Monday, June 12, 2017 at 6:51:55 PM UTC-4, [email protected] wrote: > > From some limited testing, it seems like the problem is that if a > particular host (say, 'host1') has a WARN/CRITICAL for a particular (host1, > device1) grouping, as well as an OK for a (host1, device2) grouping, that > alerts will be generated for both device1 and device2, even though only > device1 is in an alert state. > > > > I've tested this hypothesis on a host that has no groupings in an alert > state, and one with just a single grouping in an alert state. The host with > no groupings in alert state receives no alerts. > > > > Can anyone make sense of this? > > > > I'm using Kapacitor 1.3.1 BTW > > > > On Monday, June 12, 2017 at 12:17:25 PM UTC-4, [email protected] wrote: > > > Any updates on this? > > > > > > We're having this same problem. Restart Kapacitor or re-define the > task, and we get spammed alerts saying everything is OK (even from hosts > which never entered a non-OK state). > > > > > > Our TICK is pretty simple (and very similar to OP): > > > > > > stream > > > |from() > > > .database('telegraf') > > > .measurement('disk') > > > .groupBy('host', 'device') > > > |alert() > > > .warn(lambda: "used_percent" >= 80) > > > .warnReset(lambda: "used_percent" < 80) > > > .crit(lambda: "used_percent" >= 90) > > > .critReset(lambda: "used_percent" < 90) > > > .stateChangesOnly() > > > > > > > > > > > > On Wednesday, February 22, 2017 at 7:14:23 PM UTC-5, Archie Archbold > wrote: > > > > Interestingly enough, when I add the .noRecoveries() property to the > alert node I only get one DOWN alert even though there are 7 servers that > are within the alert range > > > > > > > > On Wednesday, February 22, 2017 at 11:10:09 AM UTC-8, > [email protected] wrote: > > > > If you want to ignore the OK alerts use the `.noRecoveries` property > of the alert node. This will suppress the OK alerts. > > > > > > > > On Friday, February 17, 2017 at 3:33:16 PM UTC-7, Archie Archbold > wrote: > > > > Hey all. Pretty new to TICK but I have a problem that I can't wrap > my head around. > > > > > > > > > > > > I am monitoring multiple servers all sending data to one influxdb > database and using the 'host' tag to separate the servers in the DB > > > > > > > > > > > > My 'disk' measurement is taking in mulitiple disk paths from the > servers (HOSTS) which each have a respective 'PATH' tag. > > > > > > > > > > > > So basically each server is assigned a HOST tag and each HOST has > multiple PATH tags. > > > > > > > > > > > > EXPECTED FUNCTIONALITY: kapacitor should alert upon state change of > a HOST's PATH if that path is within the alerting Lambda. > > > > PROBLEM: When I start the kapacitor service, it looks like it's > sensing a state change any time it sees another host/path with a opposite > status. > > > > > > > > > > > > This is a simplified example of the alerts I am getting: > > > > > > > > > > > > Host: host1 Path: /path1 Status: UP > > > > Host: host1 Path: /path2 Status: DOWN > > > > Host: host1 Path: /path3 Status: UP > > > > Host: host2 Path: /path1 Status: DOWN > > > > Host: host2 Path: /path2 Status: UP > > > > > > > > > > > > > > > > These alerts happen once for each host/path combination and then the > service performs as expected, alerting properly when lambda is achieved. > > > > > > > > > > > > The result of this is that I receive a slew of up/down alerts every > time I restart the kapacitor service > > > > > > > > > > > > Here is my current tick: > > > > > > > > > > > > > > > > var data = stream > > > > |from() > > > > .measurement('disk') > > > > .groupBy('host','path') > > > > |alert() > > > > .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ > index .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}') > > > > .warn(lambda: "used_percent" >= 80) > > > > .id('DISK SPACE WARNING') > > > > .email($DISK_WARN_GRP) > > > > And the corresponding DOT > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ID: disk_alert_warn > > > > > > > > Error: > > > > > > > > Template: > > > > > > > > Type: stream > > > > > > > > Status: enabled > > > > > > > > Executing: true > > > > > > > > Created: 17 Feb 17 22:27 UTC > > > > > > > > Modified: 17 Feb 17 22:27 UTC > > > > > > > > LastEnabled: 17 Feb 17 22:27 UTC > > > > > > > > Databases Retention Policies: ["main"."autogen"] > > > > > > > > TICKscript: > > > > > > > > var data = stream > > > > > > > > |from() > > > > > > > > .measurement('disk') > > > > > > > > .groupBy('host', 'path') > > > > > > > > |alert() > > > > > > > > .message('{{ .ID }} Server:{{ index .Tags "host" }} Path: {{ > index .Tags "path" }} USED PERCENT: {{ index .Fields "used_percent" }}') > > > > > > > > .warn(lambda: "used_percent" >= 80) > > > > > > > > .id('DISK SPACE WARNING') > > > > > > > > .email() > > > > > > > > > > > > > > > > > > > > DOT: > > > > > > > > digraph disk_alert_warn { > > > > > > > > graph [throughput="38.00 points/s"]; > > > > > > > > > > > > > > > > > > > > stream0 [avg_exec_time_ns="0s" ]; > > > > > > > > stream0 -> from1 [processed="284"]; > > > > > > > > > > > > > > > > > > > > from1 [avg_exec_time_ns="3.9µs" ]; > > > > > > > > from1 -> alert2 [processed="284"]; > > > > > > > > > > > > > > > > > > > > alert2 [alerts_triggered="14" avg_exec_time_ns="72.33µs" > crits_triggered="0" infos_triggered="0" oks_triggered="7" > warns_triggered="7" ]; > > > > > > > > } > > > > As you can see, I get 7 oks triggered (for host/path groups that are > not in alert range) and 7 warns triggered (for the 7 host/path groups that > are within the alert range) upon start up. > > > > Then it behaves as normal. > > > > > > > > > > > > I understand that it should be alerting for the 7 host/path groups > that are over 80 but why follow it with an alert about the ok groups? > > > > > > > > > > > > MORE INFO: When I raise the lambda to 90% (out of range for all > host/paths) I get no alerts at all (which is expected) > > > > > > > > > > > > Thanks to anyone who can help me understand this > > -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/acea46a5-fe27-4d47-a2c6-5c049675f6f1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
