Thanks Sean,

It wasn't the CQs that were lagging, but the raw inserts themselves. 
Querying the last 60 seconds of raw (non-CQ) data gave low numbers in the 
1-200 per-minute range. Moving the 1-minute query window back by eg. 10 
seconds at a time, the sum from the raw data became progressively larger, 
and stabilized around 50 seconds behind current time. 

To illustrate this, the numbers were similar to this 
(made-up-but-realistic) sequence of queries:

$ select sum(value) from "router.rpm" ... where time > now() - 70s and time 
<= now() - 10s
200
$ select sum(value) from "router.rpm" ... where time > now() - 80s and time 
<= now() - 20s
300
$ select sum(value) from "router.rpm" ... where time > now() - 90s and time 
<= now() - 30s
400
$ select sum(value) from "router.rpm" ... where time > now() - 100s and 
time <= now() - 40s
500
$ select sum(value) from "router.rpm" ... where time > now() - 110s and 
time <= now() - 50s
600
$ select sum(value) from "router.rpm" ... where time > now() - 120s and 
time <= now() - 60s
600
$ select sum(value) from "router.rpm" ... where time > now() - 130s and 
time <= now() - 70s
600

Ie. there were up to 50-second delays inserting some of the raw data 
points. At first I thought there may be a delay in sending the data to 
InfluxDB, but after rebooting the server, the above lag disappeared 
completely, and all of the queries returned ~600 RPM. I believe this is why 
the SUM in the CQ was low - it was operating on the most recent data, which 
wasn't appearing on time.

> What are the results of "SELECT COUNT(value) FROM 
ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_1d."router.rpm" WHERE time > 
now() - 15m GROUP BY time(1m)"?

Since I have rebooted the server, the results are as expected:

2016-09-02T09:20:00Z  497
2016-09-02T09:21:00Z  608
2016-09-02T09:22:00Z  569
2016-09-02T09:23:00Z  613
2016-09-02T09:24:00Z  610
2016-09-02T09:25:00Z  553
2016-09-02T09:26:00Z  594
2016-09-02T09:27:00Z  605
2016-09-02T09:28:00Z  539
2016-09-02T09:29:00Z  494
2016-09-02T09:30:00Z  510
2016-09-02T09:31:00Z  534
2016-09-02T09:32:00Z  550
2016-09-02T09:33:00Z  545
2016-09-02T09:34:00Z  544
2016-09-02T09:35:00Z  199

However when I ran this kind of query prior to rebooting the server, the 
most recent minute always had numbers in the ~200 range, due to the insert 
lag.

The problem has fixed itself since rebooting - I was mainly curious if this 
is a known problem, or if there is anything I can do to avoid this lag in 
the future. I realise it is quite hard to diagnose and give any advice when 
the data in question has been removed due to the retention policy, but if 
you have any thoughts or advice I'd appreciate it.

Thanks!

On Wednesday, August 31, 2016 at 1:36:14 AM UTC+9, Sean Beckett wrote:
>
> I'm not sure why the CQs would have started lagging, but in case it starts 
> to happen again, you can set the CQs to recalculate prior intervals, too. 
> That will help with the backfill:
>
> CREATE CONTINUOUS QUERY router_rpm_1m_sum ON ac54edda_6a34_4b8b_99d3_
> a949fb3c8994 
> *RESAMPLE FOR 5m*
> BEGIN
> SELECT sum(value) INTO ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_4w."
> router.rpm.1m.sum"
> FROM ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_1d."router.rpm"
> GROUP BY time(1m) END
>
> That will cause the CQ to recalculate the 1 minute buckets for the prior 
> five minutes each time it runs. 
>
> However, if the CQs are lagging because they can't execute in time, that 
> will just make the issue worse.
>
> What are the results of "SELECT COUNT(value) FROM ac54edda_6a34_4b8b_99d3_
> a949fb3c8994.retention_1d."router.rpm" WHERE time > now() - 15m GROUP BY 
> time(1m)"?
>
>
> On Tue, Aug 30, 2016 at 5:23 AM, <[email protected] <javascript:>> 
> wrote:
>
>> After further investigation, it seems that more than half of the inserts 
>> were lagging by up to 1min for some reason. Since continuous queries don't 
>> backfill, the continuous query sums were low, but checking the raw data 
>> showed the correct numbers since the data had appeared there later.
>>
>> Rebooting the InfluxDB server has corrected the issue and the numbers are 
>> now correct again, but I'm curious what would cause this kind of insert lag.
>>
>> On Tuesday, August 30, 2016 at 11:23:23 AM UTC+9, dave wrote:
>> > Hi, I've noticed recently that at least one of my continuous queries 
>> doesn't contain all of the data that the raw series contains. See 
>> http://imgur.com/a/R76G6 for an example comparison of the raw data vs. 
>> the continuous query data. The continuous query is defined as follows:
>> >
>> > CREATE CONTINUOUS QUERY router_rpm_1m_sum ON 
>> ac54edda_6a34_4b8b_99d3_a949fb3c8994 BEGIN
>> > SELECT sum(value) INTO 
>> ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_4w."router.rpm.1m.sum"
>> > FROM ac54edda_6a34_4b8b_99d3_a949fb3c8994.retention_1d."router.rpm"
>> > GROUP BY time(1m) END
>> >
>> > This wasn't always the case - up until several weeks ago it was 
>> recording the correct data. I noticed a dip in our traffic graph, and 
>> assumed that traffic had decreased, but recently checking the raw 
>> (non-continuous) data I discovered that this was not the case. 
>> Unfortunately the correct data has been deleted due to the retention 
>> policy, so I can no longer compare it.
>> >
>> > Server load seems low (load average 0.09, CPU ~2%, 28% memory used, 13% 
>> disk space used), and restarting InfluxDB hasn't helped.
>> >
>> > I'm running v0.13.0 on Ubuntu Ubuntu 14.04.4 LTS (
>> https://s3.amazonaws.com/dl.influxdata.com/influxdb/releases/influxdb_0.13.0_amd64.deb
>> ).
>> >
>> > Any idea what is going on here, or what my next steps might be to 
>> diagnose and fix the issue?
>> >
>> > Thanks
>>
>> --
>> Remember to include the InfluxDB version number with all issue reports
>> ---
>> You received this message because you are subscribed to the Google Groups 
>> "InfluxDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/influxdb.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/influxdb/fe63a7ac-afac-4913-8f6b-00cc1782c9ed%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Sean Beckett
> Director of Support and Professional Services
> InfluxDB
>

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/8448df7e-19c2-401c-83cc-55cc18936848%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to