Re: [influxdb] Dealing with lagging data in CQ's and Kapacitor tick scripts

nathaniel Mon, 21 Nov 2016 15:47:29 -0800

Kapacitor streams may fit your use case if I am understanding correctly. It 
sounds like your data arrives days late but that you are only aggregating 
10 minutes of data at a time. Does all the data arrive late or just some of 
it? If it all arrives late and in order then Kapacitor will be a good fit. 
Since you are only aggregating data for 10m intervals Kapacitor will only 
need to keep 10m of data in RAM. Kapacitor doesn't care what the current 
system time is, it only concerns itself with the timestamps on the data 
itself. So as long as your are writing the data in order to Kapacitor it 
will process your data correctly no matter how much the data has lagged.

If for some reason only some of the data lags then using a batch task would 
be much easier. I would write the batch task such that it is idempotent in 
time, meaning that running the task multiple times writes points at the 
exact same times. You will find that the `.align` property is quite useful 
here. This way if you rerun the task on historical data it will just 
overwrite the existing data with the updated values with the exact same 
time. To trigger a historical rewrite use can use the CLI `kapacitor 
replay-live batch ...` command or the API 
directly, 
https://docs.influxdata.com/kapacitor/v1.1/api/api/#replay-data-without-recording.

As for detecting when to replay the task you will have to drive that 
externally to Kapacitor. In both cases you will want to set 
`recording-time` to true so that it uses the lagged times.

In summary if your data arrives in order use a stream task and it should 
just work :), otherwise setup a batch task and a schedule on which to 
replay it against historical data.

On Monday, November 21, 2016 at 3:58:19 PM UTC-7, Sean Beckett wrote:
>
> In stream mode, Kapacitor holds all data in RAM. Since the data may be 
> arriving days late, I don't think that's a tenable solution.
>
> If the data gaps can be identified easily, you just need a Kapacitor batch 
> to COUNT() points in an interval. If the number doesn't match, then have 
> Kapacitor issue the INTO query.
>
> If your data is not regular, then I don't see how to detect the absence of 
> a signal.
> 
>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/98b2b468-523c-4bdc-ad69-370d161f77a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Dealing with lagging data in CQ's and Kapacitor tick scripts

Reply via email to