Thanks Sean, It wasn't the CQs that were lagging, but the raw inserts themselves. Querying the last 60 seconds of raw (non-CQ) data gave low numbers in the 1-200 per-minute range. Moving the 1-minute query window back by eg. 10 seconds at a time, the sum from the raw data became progressively larger, and stabilized around 50 seconds behind current time.
To illustrate this, the numbers were similar to this (made-up-but-realistic) sequence of queries: $ select sum(value) from "router.rpm" ... where time > now() - 70s and time <= now() - 10s 200 $ select sum(value) from "router.rpm" ... where time > now() - 80s and time <= now() - 20s 300 $ select sum(value) from "router.rpm" ... where time > now() - 90s and time <= now() - 30s 400 $ select sum(value) from "router.rpm" ... where time > now() - 100s and time <= now() - 40s 500 $ select sum(value) from "router.rpm" ... where time > now() - 110s and time <= now() - 50s 600 $ select sum(value) from "router.rpm" ... where time > now() - 120s and time <= now() - 60s 600 $ select sum(value) from "router.rpm" ... where time > now() - 130s and time <= now() - 70s 600 Ie. there were up to 50-second delays inserting some of the raw data points. At first I thought there may be a delay in sending the data to InfluxDB, but after rebooting the server, the above lag disappeared completely, and all of the queries returned ~600 RPM. I believe this is why the SUM in the CQ was low - it was operating on the most recent data, which wasn't appearing on time. > What are the results of "SELECT COUNT(value) FROM ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_1d."router.rpm" WHERE time > now() - 15m GROUP BY time(1m)"? Since I have rebooted the server, the results are as expected: 2016-09-02T09:20:00Z 497 2016-09-02T09:21:00Z 608 2016-09-02T09:22:00Z 569 2016-09-02T09:23:00Z 613 2016-09-02T09:24:00Z 610 2016-09-02T09:25:00Z 553 2016-09-02T09:26:00Z 594 2016-09-02T09:27:00Z 605 2016-09-02T09:28:00Z 539 2016-09-02T09:29:00Z 494 2016-09-02T09:30:00Z 510 2016-09-02T09:31:00Z 534 2016-09-02T09:32:00Z 550 2016-09-02T09:33:00Z 545 2016-09-02T09:34:00Z 544 2016-09-02T09:35:00Z 199 However when I ran this kind of query prior to rebooting the server, the most recent minute always had numbers in the ~200 range, due to the insert lag. The problem has fixed itself since rebooting - I was mainly curious if this is a known problem, or if there is anything I can do to avoid this lag in the future. I realise it is quite hard to diagnose and give any advice when the data in question has been removed due to the retention policy, but if you have any thoughts or advice I'd appreciate it. Thanks! On Wednesday, August 31, 2016 at 1:36:14 AM UTC+9, Sean Beckett wrote: > > I'm not sure why the CQs would have started lagging, but in case it starts > to happen again, you can set the CQs to recalculate prior intervals, too. > That will help with the backfill: > > CREATE CONTINUOUS QUERY router_rpm_1m_sum ON ac54edda_6a34_4b8b_99d3_ > a949fb3c8994 > *RESAMPLE FOR 5m* > BEGIN > SELECT sum(value) INTO ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_4w." > router.rpm.1m.sum" > FROM ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_1d."router.rpm" > GROUP BY time(1m) END > > That will cause the CQ to recalculate the 1 minute buckets for the prior > five minutes each time it runs. > > However, if the CQs are lagging because they can't execute in time, that > will just make the issue worse. > > What are the results of "SELECT COUNT(value) FROM ac54edda_6a34_4b8b_99d3_ > a949fb3c8994.retention_1d."router.rpm" WHERE time > now() - 15m GROUP BY > time(1m)"? > > > On Tue, Aug 30, 2016 at 5:23 AM, <[email protected] <javascript:>> > wrote: > >> After further investigation, it seems that more than half of the inserts >> were lagging by up to 1min for some reason. Since continuous queries don't >> backfill, the continuous query sums were low, but checking the raw data >> showed the correct numbers since the data had appeared there later. >> >> Rebooting the InfluxDB server has corrected the issue and the numbers are >> now correct again, but I'm curious what would cause this kind of insert lag. >> >> On Tuesday, August 30, 2016 at 11:23:23 AM UTC+9, dave wrote: >> > Hi, I've noticed recently that at least one of my continuous queries >> doesn't contain all of the data that the raw series contains. See >> http://imgur.com/a/R76G6 for an example comparison of the raw data vs. >> the continuous query data. The continuous query is defined as follows: >> > >> > CREATE CONTINUOUS QUERY router_rpm_1m_sum ON >> ac54edda_6a34_4b8b_99d3_a949fb3c8994 BEGIN >> > SELECT sum(value) INTO >> ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_4w."router.rpm.1m.sum" >> > FROM ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_1d."router.rpm" >> > GROUP BY time(1m) END >> > >> > This wasn't always the case - up until several weeks ago it was >> recording the correct data. I noticed a dip in our traffic graph, and >> assumed that traffic had decreased, but recently checking the raw >> (non-continuous) data I discovered that this was not the case. >> Unfortunately the correct data has been deleted due to the retention >> policy, so I can no longer compare it. >> > >> > Server load seems low (load average 0.09, CPU ~2%, 28% memory used, 13% >> disk space used), and restarting InfluxDB hasn't helped. >> > >> > I'm running v0.13.0 on Ubuntu Ubuntu 14.04.4 LTS ( >> https://s3.amazonaws.com/dl.influxdata.com/influxdb/releases/influxdb_0.13.0_amd64.deb >> ). >> > >> > Any idea what is going on here, or what my next steps might be to >> diagnose and fix the issue? >> > >> > Thanks >> >> -- >> Remember to include the InfluxDB version number with all issue reports >> --- >> You received this message because you are subscribed to the Google Groups >> "InfluxDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at https://groups.google.com/group/influxdb. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/influxdb/fe63a7ac-afac-4913-8f6b-00cc1782c9ed%40googlegroups.com >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Sean Beckett > Director of Support and Professional Services > InfluxDB > -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/8448df7e-19c2-401c-83cc-55cc18936848%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
