Think I may have found a solution: If you need to constantly insert data with timestamps in the past and you > want to automatically downsample it, you can script the CLI standalone > query function > <https://www.google.com/url?q=https%3A%2F%2Fdocs.influxdata.com%2Finfluxdb%2Fv0.13%2Ftools%2Fshell%2F%23execute-an-influxql-command-and-quit-with-execute&sa=D&sntz=1&usg=AFQjCNFtMsYtWJjh-W4DuXLfexofg_rYEw> > with > an INTO query covering the appropriate historical range.
I was initially concerned that I couldn't use the "collection timestamp" (which can be any amount of time in the past) since it would fall outside of the CQ's time range unless I covered my worst-case scenario with an inappropriately-large time range in the RESAMPLE FOR interval. And I couldn't use the "collection timestamp" as a tag, since learned through more reading that tags can only be strings, right? It looks like the approach I should use is to manage it in my ETL operation, and if it looks like I'm dealing with a timestamp outside of the "normal" range of my CQ, I fire off an INTO query that can cover "appropriate historical range". Am I reading all this correctly? On Thursday, November 10, 2016 at 6:30:50 PM UTC-8, John Burk wrote: > > Apologies in advance, long time SQL, completely new to InfluxDB, working > through the concepts. > > Some background first: > > I have a data collector which runs at clients' sites and gathers a metric > sample every minute, stores in a local journal, and then (barring issues > with either internet connectivity or my cloud-based data collection service > being unavailable) transmits the batch of records every 15 minutes. If the > collector is unable to transmit the batch after gathering 15 records, it > continues to accumulate records in the journal until such time as it can > successfully post the entire batch. > > The batch can be of any size, so far we've seen clients experience up 12 > hours of connectivity issues, and then successfully transmit the batch of > 700+ records. I imagine someday we'll see batches that contain 24 hours' > worth, or more. The point is, the batch can be of any size, and the oldest > record in the batch can be at any point in the past. > > On my end, I store the records in a relational database with two different > timestamps: > > - the insertion timestamp (when the service received the batch, the > "reporting" timestamp) > - the sample timestamp (timestamp when the single metric record was > collected) > > I'm storing these records in an RDBMS, but want to do an > extract/transform/load into InfluxDB for charting data. > > So when doing the RDBMS -> InfluxDB ETL, I can just pull from the RDBMS > based on WHERE insertion_timestamp > my_last_ETL_query. All well and > good. > > When everything's going well, all the new datapoints are from the last 15 > minutes. But any ETL operation may end up getting recently-inserted data > (recently inserted into the RDBMS) whose sample timestamps can be from > quite a while back in the past. > > I'm looking at how to downsample this data with CQ's, but I don't want to > use a ridiculously long duration literal in a RESAMPLE FOR in order to > cover my worst-case scenario; I expect that CQ's get more expensive the > larger the time range gets relative to the GROUP BY interval. One of the > CQ's I want to write will just downsample GROUP BY time(2m), a factor of > 1/2, so a 12-hour (or longer) time range just seems wrong. > > I had thought about not specifying the timestamp when doing the insertion > into InfluxDB and using the sample timestamp as a tag so that the CQ could > tell what was recently inserted, but as far as I can tell you can't perform > functions on tags. Is this correct? > > > -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/a03d511a-5166-475f-8403-4489cc0c8f02%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
