Kapacitor streams may fit your use case if I am understanding correctly. It sounds like your data arrives days late but that you are only aggregating 10 minutes of data at a time. Does all the data arrive late or just some of it? If it all arrives late and in order then Kapacitor will be a good fit. Since you are only aggregating data for 10m intervals Kapacitor will only need to keep 10m of data in RAM. Kapacitor doesn't care what the current system time is, it only concerns itself with the timestamps on the data itself. So as long as your are writing the data in order to Kapacitor it will process your data correctly no matter how much the data has lagged.
If for some reason only some of the data lags then using a batch task would be much easier. I would write the batch task such that it is idempotent in time, meaning that running the task multiple times writes points at the exact same times. You will find that the `.align` property is quite useful here. This way if you rerun the task on historical data it will just overwrite the existing data with the updated values with the exact same time. To trigger a historical rewrite use can use the CLI `kapacitor replay-live batch ...` command or the API directly, https://docs.influxdata.com/kapacitor/v1.1/api/api/#replay-data-without-recording. As for detecting when to replay the task you will have to drive that externally to Kapacitor. In both cases you will want to set `recording-time` to true so that it uses the lagged times. In summary if your data arrives in order use a stream task and it should just work :), otherwise setup a batch task and a schedule on which to replay it against historical data. On Monday, November 21, 2016 at 3:58:19 PM UTC-7, Sean Beckett wrote: > > In stream mode, Kapacitor holds all data in RAM. Since the data may be > arriving days late, I don't think that's a tenable solution. > > If the data gaps can be identified easily, you just need a Kapacitor batch > to COUNT() points in an interval. If the number doesn't match, then have > Kapacitor issue the INTO query. > > If your data is not regular, then I don't see how to detect the absence of > a signal. > > -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/98b2b468-523c-4bdc-ad69-370d161f77a5%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
