Think I may have found a solution:

If you need to constantly insert data with timestamps in the past and you 
> want to automatically downsample it, you can script the CLI standalone 
> query function 
> <https://www.google.com/url?q=https%3A%2F%2Fdocs.influxdata.com%2Finfluxdb%2Fv0.13%2Ftools%2Fshell%2F%23execute-an-influxql-command-and-quit-with-execute&sa=D&sntz=1&usg=AFQjCNFtMsYtWJjh-W4DuXLfexofg_rYEw>
>  with 
> an INTO query covering the appropriate historical range.


I was initially concerned that I couldn't use the "collection timestamp" 
(which can be any amount of time in the past) since it would fall outside 
of the CQ's time range unless I covered my worst-case scenario with an 
inappropriately-large time range in the RESAMPLE FOR interval.  And I 
couldn't use the "collection timestamp" as a tag, since learned through 
more reading that tags can only be strings, right?

It looks like the approach I should use is to manage it in my ETL 
operation, and if it looks like I'm dealing with a timestamp outside of the 
"normal" range of my CQ, I fire off an INTO query that can cover 
"appropriate historical range".

Am I reading all this correctly? 


On Thursday, November 10, 2016 at 6:30:50 PM UTC-8, John Burk wrote:
>
> Apologies in advance, long time SQL, completely new to InfluxDB, working 
> through the concepts.
>
> Some background first:
>
> I have a data collector which runs at clients' sites and gathers a metric 
> sample every minute, stores in a local journal, and then (barring issues 
> with either internet connectivity or my cloud-based data collection service 
> being unavailable) transmits the batch of records every 15 minutes.  If the 
> collector is unable to transmit the batch after gathering 15 records, it 
> continues to accumulate records in the journal until such time as it can 
> successfully post the entire batch.
>
> The batch can be of any size, so far we've seen clients experience up 12 
> hours of connectivity issues, and then successfully transmit the batch of 
> 700+ records.  I imagine someday we'll see batches that contain 24 hours' 
> worth, or more.  The point is, the batch can be of any size, and the oldest 
> record in the batch can be at any point in the past.
>
> On my end, I store the records in a relational database with two different 
> timestamps:
>
>    - the insertion timestamp (when the service received the batch, the 
>    "reporting" timestamp)
>    - the sample timestamp (timestamp when the single metric record was 
>    collected)
>
> I'm storing these records in an RDBMS, but want to do an 
> extract/transform/load into InfluxDB for charting data.
>
> So when doing the RDBMS -> InfluxDB ETL, I can just pull from the RDBMS 
> based on WHERE insertion_timestamp > my_last_ETL_query.  All well and 
> good.
>
> When everything's going well, all the new datapoints are from the last 15 
> minutes.  But any ETL operation may end up getting recently-inserted data 
> (recently inserted into the RDBMS) whose sample timestamps can be from 
> quite a while back in the past.
>
> I'm looking at how to downsample this data with CQ's, but I don't want to 
> use a ridiculously long duration literal in a RESAMPLE FOR in order to 
> cover my worst-case scenario; I expect that CQ's get more expensive the 
> larger the time range gets relative to the GROUP BY interval.  One of the 
> CQ's I want to write will just downsample GROUP BY time(2m), a factor of 
> 1/2, so a 12-hour (or longer) time range just seems wrong.
>
> I had thought about not specifying the timestamp when doing the insertion 
> into InfluxDB and using the sample timestamp as a tag so that the CQ could 
> tell what was recently inserted, but as far as I can tell you can't perform 
> functions on tags.  Is this correct?
>
>
>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/a03d511a-5166-475f-8403-4489cc0c8f02%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to