The problem is probably the window node buffering data. The window will not 
emit the current batch of data until the it has received a point the is 
more than a minute past the first point in the window. Since the 5xx are 
sparse points you will only get alerts after the next 5xx event happens. 

> Would it be better to just stream every entry in that measurement and 
have the alert's warn level be based on a lambda that checks if the 
'status_code' is 5xx instead?

As you suggest here by windowing the entire data set you will get better 
less laggy windows. Then you can filter down the window for just the 5xx 
error and sum them per window.

var data = stream
   |from()
       .database('production')
       .retentionPolicy('default')
       .measurement('controller.action.count')
       .groupBy('component', 'controller', 'action', 'status_code')
   |window()
       .period(1m)
       .every(1m)
   |where(lambda: "status_code" =~ /^5\d\d/)
   |sum('value')
       .as('stat')

var alert = data
   |alert()
       .id('{{ index .Tags "component" }}::{{ index .Tags "controller" 
}}#{{ index .Tags "action" }} Error')
       .message('{{ .ID }}: {{ index .Fields "stat" }} {{ index .Tags 
"status_code" }} error(s) has occurred')
       .warn(lambda: "stat" > 0)
       .topic('controller_errors')

Give that a try.

On Tuesday, February 14, 2017 at 7:31:38 AM UTC-7, Chris TenHarmsel wrote:
>
> Hi everyone,
> I have server that handles HTTP requests sending a metric to influxdb for 
> each request and one of the data fields is the response code.  I wanted to 
> set up a kapacitor script to alert whenever a 5xx response is generated but 
> I am seeing strange behavior.
>
> Here's my tick script:
>
> var data = stream
>     |from()
>         .database('production')
>         .retentionPolicy('default')
>         .measurement('controller.action.count')
>         .where(lambda: "status_code" =~ /^5\d\d/)
>         .groupBy('component', 'controller', 'action', 'status_code')
>     |window()
>         .period(1m)
>         .every(1m)
>     |sum('value')
>         .as('stat')
>
> var alert = data
>     |alert()
>         .id('{{ index .Tags "component" }}::{{ index .Tags "controller" 
> }}#{{ index .Tags "action" }} Error')
>         .message('{{ .ID }}: {{ index .Fields "stat" }} {{ index .Tags 
> "status_code" }} error(s) has occurred')
>         .warn(lambda: "stat" > 0)
>         .topic('controller_errors')
>
>
> The very first time a [component, controller, action] serves up a 500 
> error I will get an alert on the topic (it outputs to slack), but never 
> again for that combination.
>
> The handler listening to the topic does not have "stateChangesOnly" 
> specified.
>
> Is this the right way to go about this?  Would it be better to just stream 
> every entry in that measurement and have the alert's warn level be based on 
> a lambda that checks if the 'status_code' is 5xx instead?
>
> Btw, here's the output of `kapacitor show error_check`:
>
> digraph error_check {
> graph [throughput="9.00 points/s"];
>
> stream0 [avg_exec_time_ns="0s" ];
> stream0 -> from1 [processed="4431449"];
>
> from1 [avg_exec_time_ns="58.058µs" ];
> from1 -> window2 [processed="177"];
>
> window2 [avg_exec_time_ns="22.448µs" ];
> window2 -> sum3 [processed="15"];
>
> sum3 [avg_exec_time_ns="0s" ];
> sum3 -> alert4 [processed="15"];
>
> alert4 [alerts_triggered="15" avg_exec_time_ns="55.86451ms" 
> crits_triggered="0" infos_triggered="0" oks_triggered="0" 
> warns_triggered="15" ];
>
> I know from our dashboards that we've had way more than 15 5xx errors 
> since that check started running.
>
> Any advice?
>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/1bd99393-d8a1-407e-8034-cddc684821d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to