Apologies in advance, long time SQL, completely new to InfluxDB, working through the concepts.
Some background first: I have a data collector which runs at clients' sites and gathers a metric sample every minute, stores in a local journal, and then (barring issues with either internet connectivity or my cloud-based data collection service being unavailable) transmits the batch of records every 15 minutes. If the collector is unable to transmit the batch after gathering 15 records, it continues to accumulate records in the journal until such time as it can successfully post the entire batch. The batch can be of any size, so far we've seen clients experience up 12 hours of connectivity issues, and then successfully transmit the batch of 700+ records. I imagine someday we'll see batches that contain 24 hours' worth, or more. The point is, the batch can be of any size, and the oldest record in the batch can be at any point in the past. On my end, I store the records in a relational database with two different timestamps: - the insertion timestamp (when the service received the batch, the "reporting" timestamp) - the sample timestamp (timestamp when the single metric record was collected) I'm storing these records in an RDBMS, but want to do an extract/transform/load into InfluxDB for charting data. So when doing the RDBMS -> InfluxDB ETL, I can just pull from the RDBMS based on WHERE insertion_timestamp > my_last_ETL_query. All well and good. When everything's going well, all the new datapoints are from the last 15 minutes. But any ETL operation may end up getting recently-inserted data (recently inserted into the RDBMS) whose sample timestamps can be from quite a while back in the past. I'm looking at how to downsample this data with CQ's, but I don't want to use a ridiculously long duration literal in a RESAMPLE FOR in order to cover my worst-case scenario; I expect that CQ's get more expensive the larger the time range gets relative to the GROUP BY interval. One of the CQ's I want to write will just downsample GROUP BY time(2m), a factor of 1/2, so a 12-hour (or longer) time range just seems wrong. I had thought about not specifying the timestamp when doing the insertion into InfluxDB and using the sample timestamp as a tag so that the CQ could tell what was recently inserted, but as far as I can tell you can't perform functions on tags. Is this correct? -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/dc6d5328-fffe-4ef1-a249-f86d73b122bb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
