I was thinking something much simpler. Use the streaming mode to extract a set of normalized timestamps per aggregation interval (say, if my aggregation is 10 minutes just keep a set of N = (time - (time % 10 minutes)) ) and discard rest of the data - and then every 10 minutes execute an "INTO" query for every "timestamp to timestamp + 10 minutes" window in the resulting set. Normally you will only get one item in that set, but if any late data arrives, it will also be refreshed. This way we do not have to worry about late items without having todo a lot of extra work refreshing the aggregates every time, and the memory requirements are pretty minimal.
This is not too hard to implement this outside of the Influx, but it would require extra code to run on regular basis, a new DB to store the normalized timestamp set (not sure if Influx has ability to store "set" data), and to be able to put some sort of a proxy filter before data hits the db. Sounds like a recipe for trouble having so many extra moving pieces. While it would be nice if CQs just supported this out of the box, I was hoping that at least I can use Kapacitor to implement this without the many external components Aside - I was thinking of at least not requiring a separate DB for set storage but no idea how to best store a "set" in InfluxDB (it may not even be possible) - I suppose i could set up a table to store normalized timestamp as a a value, but that means there is an extra data point for every point inserted, which is very wasteful as I only care about one instance of the normalized item timestamp per agg window.... actually, now that I am writing this it occurs to me that I suppose I can use a tags instead of a values to limit the entry to a single item (using the fact that you can only have one instance of a timestamp/tag combination to make it a set.... So, something like this: Set up a measurement with tags: ** measurement ** tagset ** timewindow and for every item I see insert an entry using: time: <normalized current timestamp> tag:measurement: <measurement of item being recorded> tag:tagset: <tagset of the item being recorded> tag:timewindow: <normalized timestamp from item being recorded> value: true (hardcoded value, it is irrelevant) then I can query the table by time and group by tags, which will produce a listing of time windows for each series (as defined by measurement/tagset tags) to be (re)aggregated That seems like it would work, at least in theory, but I am concerned that this type of (ab)use of InfluxDB will generate a ton of tiny series' - and I have no idea if InfluxDB would handle this in scale - simply storing this data in a set in something like Redis would be much simpler...:-/ Thanks, -M On Monday, November 21, 2016 at 2:58:19 PM UTC-8, Sean Beckett wrote: > In stream mode, Kapacitor holds all data in RAM. Since the data may be > arriving days late, I don't think that's a tenable solution. > > > If the data gaps can be identified easily, you just need a Kapacitor batch to > COUNT() points in an interval. If the number doesn't match, then have > Kapacitor issue the INTO query. > > > If your data is not regular, then I don't see how to detect the absence of a > signal. -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/e5489e38-2f38-44e1-a619-3552bb9f8378%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
