This is great work! We'll take a look at those issues to see if we can't 
help simplify this workflow. Thanks again.

On Wednesday, September 14, 2016 at 12:07:22 PM UTC-6, [email protected] 
wrote:
>
> Finally came up with a solution for this:
>
>     var interval = 5s
>     var threshold_warn = 65s
>     var threshold_crit = 305s
>     // Note: the alert will come in between (threshold + interval) and 
> (threshold + interval * 2).
>     // This should be improved by 
> https://github.com/influxdata/kapacitor/issues/898
>     
>     var data = stream
>       |from().measurement('system').groupBy('host')
>       |stats(interval).align()
>       |derivative('emitted').unit(interval).as('emitted') // can't use 
> difference() because of https://github.com/influxdata/kapacitor/issues/904
>     
>     var data_warn_window = data
>       |window().period(threshold_warn).every(1u)
>     var data_warn_size = data_warn_window // will contain the number of 
> points in the window
>       |count('emitted').as('value')
>     var data_warn = data_warn_window
>       |sum('emitted').as('value')
>       |join(data_warn_size).as('emitted','size')
>     
>     var data_crit_window = data
>       |window().period(threshold_crit).every(1u)
>     var data_crit_size = data_crit_window // will contain the number of 
> points in the window
>       |count('emitted').as('value')
>     var data_crit = data_crit_window
>       |sum('emitted').as('value')
>       |join(data_crit_size).as('emitted','size')
>     
>     data_warn
>       |join(data_crit).as('warn','crit')
>       |alert()
>         .crit(lambda:
>           ("crit.size.value" * interval >= threshold_crit) // make sure we 
> have a full window to prevent false alerts at start
>           AND ("crit.emitted.value" == 0)
>         )
>         .warn(lambda:
>           ("warn.size.value" * interval >= threshold_warn) // make sure we 
> have a full window to prevent false alerts at start
>           AND ("warn.emitted.value" == 0)
>         )
>
>
> Unfortunately with this design it's not possible to put in the alert 
> message how long the host has been unresponsive for, only that it's greater 
> than the threshold. But at least it works.
>
> -Patrick
>
>
> On Monday, September 12, 2016 at 5:30:36 PM UTC-4, [email protected] 
> wrote:
> > I'm trying to detect when nodes stop reporting data to telegraf. However 
> I don't want to use something like `deadman` as I want to have multiple 
> alert levels, such as warn and critical. Warn would be if no data has been 
> received for >60s, and critical would be >300s (or similar).
> > 
> > The only way I can think of to do this is taking a 
> `stats()|derivative()` node, copying it with a `where(lambda: "emitted" > 
> 0)`, and then getting the time difference between the last data point with 
> the filter and the last data point without the filter. But I can't figure 
> out how to accomplish this.
> > 
> > Any help would be appreciated.
> > 
> > Thanks
> > 
> > -Patrick
>
>

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/a820a89c-dc4b-4593-95dd-b2b97175ee36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to