I was thinking something much simpler. Use the streaming mode to extract a set 
of normalized timestamps per aggregation interval (say, if my aggregation is 10 
minutes just  keep a set of N = (time - (time % 10 minutes)) ) and discard rest 
of the data - and then every 10 minutes execute an "INTO" query for every 
"timestamp to timestamp + 10 minutes" window in the resulting set. Normally you 
will only get one item in that set, but if any late data arrives, it will also 
be refreshed. This way we do not have to worry about late items without having 
todo a lot of extra work refreshing the aggregates every time, and the memory 
requirements are pretty minimal.

This is not too hard to implement this outside of the Influx, but it would 
require extra code to run on regular basis, a new DB to store the normalized 
timestamp set (not sure if Influx has ability to store "set" data),  and to be 
able to put some sort of a proxy filter before data hits the db. Sounds like a 
recipe for trouble having so many extra moving pieces.

While it would be nice if CQs just supported this out of the box, I was hoping 
that at least I can use Kapacitor to implement this without the many external 
components

Aside - I was thinking of at least not requiring a separate DB for set storage 
but no idea how to best store a "set" in InfluxDB (it may not even be possible) 
- I suppose i could set up a table to store normalized timestamp as a a value, 
but that means there is an extra data point for every point inserted, which is 
very wasteful as I only care about one instance of the normalized item 
timestamp per agg window.... actually, now that I am writing this it occurs to 
me that I suppose I can use a tags instead of a values to limit the entry to a 
single item (using the fact that you can only have one instance of a 
timestamp/tag combination to make it a set....

So, something like this:

Set up a measurement with tags:
** measurement
** tagset
** timewindow

and for every item I see insert an entry using:

time: <normalized current timestamp> 
tag:measurement: <measurement of item being recorded>
tag:tagset: <tagset of the item being recorded>
tag:timewindow: <normalized timestamp from item being recorded>
value: true (hardcoded value, it is irrelevant)

then I can query the table by time and group by tags, which will produce a 
listing of time windows for each series (as defined by measurement/tagset tags) 
to be (re)aggregated

That seems like it would work, at least in theory, but I am concerned that this 
type of (ab)use of InfluxDB will generate a ton of tiny series' - and I have no 
idea if InfluxDB would handle this in scale - simply storing this data in a set 
in something like Redis would be much simpler...:-/
 
Thanks,

-M

On Monday, November 21, 2016 at 2:58:19 PM UTC-8, Sean Beckett wrote:
> In stream mode, Kapacitor holds all data in RAM. Since the data may be 
> arriving days late, I don't think that's a tenable solution.
> 
> 
> If the data gaps can be identified easily, you just need a Kapacitor batch to 
> COUNT() points in an interval. If the number doesn't match, then have 
> Kapacitor issue the INTO query.
> 
> 
> If your data is not regular, then I don't see how to detect the absence of a 
> signal.​

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/e5489e38-2f38-44e1-a619-3552bb9f8378%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to