FYI, I dug in and found several optimizations that can be made to the union node. PR will be incoming.
On Friday, November 11, 2016 at 1:59:20 PM UTC-7, [email protected] wrote: > > Below is a script that gets closer but still suffers from the 2nd and 3rd > issues you mentioned. It does fix the first issue. > > I'll file a github issue to see if we can change elapsed to return a 0 if > it only receives a single point. > The third issue about buffering is going to stay. The buffering is coming > from the union node. It needs to make sure that it emits points ordered by > time and so it buffers points until it knows no more points can arrive. (It > did seem to be buffering too many points when I tested it so I'll double > check it can't be optimized). > > You have me thinking, I'll be playing around with this some more to make > sure I haven't missed something. > > var data = batch | query('select active from > telegraf.autogen.postgresql_replication_slots') > .period(4h) > .every(10s) > .groupBy('host','slot_name') > > var data_last = data > |last('active') > .as('active') > .usePointTimes() > > var data_last_active = data > |where(lambda: "active" == TRUE) > |last('active') > .as('active') > .usePointTimes() > > var data_union = data_last > |union(data_last_active) > |log() > .prefix('UNION STREAM') > |window() > .period(10s) > .every(10s) > .align() > |log() > .prefix('UNION BATCH') > > var data_elapsed = data_union > |elapsed('active', 1s) > .as('elapsed') > |last('elapsed') > .as('elapsed') > |log() > .prefix('ELAPSED') > > var data_count = data_union > |count('active') > .as('count') > |log() > .prefix('COUNT') > > data_elapsed > |join(data_count) > .as('elapsed', 'count') > .fill('none') > |log() > .prefix('JOIN') > > > On Friday, November 11, 2016 at 10:34:06 AM UTC-7, [email protected] > wrote: >> >> I'm trying to create an alert when a specific field has been in a certain >> state for too long. >> Currently my tick script looks like this: >> >> var data = batch | query('select active from >> telegraf.autogen.postgresql_replication_slots') >> .period(4h) >> .every(10s) >> .groupBy('host','slot_name') >> >> var data_last = data|last('active').as('active') >> var data_last_active = data|where(lambda: "active" == >> TRUE)|last('active').as('active') >> >> var data_union = data_last|union(data_last_active) >> var data_elapsed = data_union|elapsed('active',1s)|log() >> var data_count = data_union|count('active').as('count')|log() >> >> data_elapsed|join(data_count).as('elapsed','count').tolerance(10s).fill('none')|log() >> >> The idea is that `data_last` will be the last data point. >> `data_last_active` will be the last data point where `active == true`. We >> would then calculate the time difference between these 2 points. If the >> difference is greater than X, generate an alert. >> But we also want to handle the case where `data_last_active` is empty (no >> match within time period), so we get the count of points, which in this >> case would be 1. >> >> However there are numerous problems with this: >> 1. `elapsed()` is including the data points from the previous batch >> period, instead of just within the batch. So if batch.period is 60s, then >> one of the elapsed values is going to be 60s. >> 2. `elapsed()` won't emit anything at all if there is no previous data >> point, thus breaking the case where `data_last_active` is empty. >> 3. `count()` is buffering, and doesn't release the data points until the >> next batch comes in. >> >> >> Here's an example of what the above generates: >> [test:log10] 2016/11/11 12:31:09 I! >> >> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gbar01stg","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gbar01stg"},"Fields":{"count":2},"Time":"2016-11-11T12:30:59.785563762-05:00"} >> [test:log10] 2016/11/11 12:31:09 I! >> >> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gdbs01qa","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gdbs01qa"},"Fields":{"count":1},"Time":"2016-11-11T12:30:59.785563762-05:00"} >> [test:log8] 2016/11/11 12:31:09 I! >> >> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gbar01stg","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gbar01stg"},"Fields":{"elapsed":10},"Time":"2016-11-11T17:31:09.785572403Z"} >> [test:log8] 2016/11/11 12:31:09 I! >> >> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gbar01stg","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gbar01stg"},"Fields":{"elapsed":0},"Time":"2016-11-11T17:31:09.785572403Z"} >> [test:log8] 2016/11/11 12:31:09 I! >> >> {"Name":"postgresql_replication_slots","Database":"","RetentionPolicy":"","Group":"host=fll2gdbs01qa,slot_name=fll2gdbs01qa","Dimensions":{"ByName":false,"TagNames":["host","slot_name"]},"Tags":{"host":"fll2gdbs01qa","slot_name":"fll2gdbs01qa"},"Fields":{"elapsed":10},"Time":"2016-11-11T17:31:09.785572403Z"} >> >> >> Any suggestions to get this working? >> >> -Patrick >> >> -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/09be8854-c51b-4fdd-bd66-1d47b8c328ee%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
