Thanks Sean, I appreciate your help.

I did write a quick python script as a proof of concept yesterday, and it seems 
to work ok.  I was a little surprised by how long this approach will take, but 
I haven't yet profiled the code to see where my time is spent.  Before I go 
much further I'd like to verify exactly how data is stored on disk and how the 
data is sharded.

During this migration I'm consolidating a number of measurements, with part of 
the metric name becoming an additional tag.  The influxdb-python result set 
returns query results as a list of series.  Due to that structure I am writing 
batches of points in chronological order, but repeating that same chronological 
period roughly 50 times across different series within the measurement.  Is 
this a valid approach, or should I interleave the points chronologically, only 
passing over the same time period once?
 

> On Jul 10, 2016, at 9:52 PM, Sean Beckett <[email protected]> wrote:
> 
> When writing older data, as long as you submit batches of points with 
> timestamps in chronological order, it should still be fairly performant. 
> Backfilling the downsampling with the INTO clause should also be performant 
> provided you restrict the time ranges to a perhaps a week at a time.
> 
> We have an internal tool for querying data and replaying it into another 
> instance. I'll see if that's something we can release, or if we can point you 
> to the right code.
> 
> On Sat, Jul 9, 2016 at 4:20 PM, Ryan January <[email protected] 
> <mailto:[email protected]>> wrote:
> I currently have data I'm trying to migrate from one InfluxDB instance to a 
> newer installation and not sure of the proper way to do so.  
> As a bit of background: We're sending app metrics to statsd, which are 
> passing through telegraf to be persisted in InfluxDB on 10 second intervals. 
> During the migration to the new server I'm taking the opportunity to 
> restructure the schema to account for some shortcomings during the first 
> install. Due to this restructuring I can't perform a simple backup/restore.  
> 
> The old server is .11, new is currently using .13, both using RHEL
> 
> I'm writing a fairly consistent 20 points per second (total) across 50 
> measurements, each having roughly 130 series.  
> Each measurement has 3 tags and 6 fields.
> No retention policy is currently in place.
> The database is approximately 2 gb and covers a timespan of roughly 1 year. 
> 
> The new server will store 10s resolution in a 60 day retention policy, the 
> data will also be downsampled to 1m resolution into a 1 year retention 
> policy. I'd like to migrate the current 10s samples into the new server, and 
> 1 year retention policy.  
> I'm planning to backfill the old data after the apps have begun sending 
> metrics to the new server.
> 
> My current plan is to write a script to query 1 minute aggregate data from 
> the old DB in 1 hour chunks, rewriting that data to the new DB.
> The little I know about the TSM engine is from videos posted on the 
> influxdata site.  It stated that InfluxDB was optimized around the thought 
> that we're only modifying recent data, and that there may be a write penalty 
> as older shards are rewritten.  
> 
> How much of an issue is this if no data currently exists in those timeframes? 
> Are there other methods of backfill that I should consider?
> 
> Thank you,
> Ryan
> 
> 
> 
> 
> -- 
> Remember to include the InfluxDB version number with all issue reports
> --- 
> You received this message because you are subscribed to the Google Groups 
> "InfluxDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at https://groups.google.com/group/influxdb 
> <https://groups.google.com/group/influxdb>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/influxdb/50701a8b-5a8e-4abd-bf60-7dcecb2cf65d%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/influxdb/50701a8b-5a8e-4abd-bf60-7dcecb2cf65d%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> 
> 
> -- 
> Sean Beckett
> Director of Support and Professional Services
> InfluxDB
> 
> -- 
> Remember to include the InfluxDB version number with all issue reports
> --- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "InfluxDB" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/influxdb/A9WSWOdorxY/unsubscribe 
> <https://groups.google.com/d/topic/influxdb/A9WSWOdorxY/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at https://groups.google.com/group/influxdb 
> <https://groups.google.com/group/influxdb>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/influxdb/CALGqCvOe5t%2BXN8cP4vfjSi6ZyqHOa3uAwhzL8tZwvbdhVO4kjQ%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/influxdb/CALGqCvOe5t%2BXN8cP4vfjSi6ZyqHOa3uAwhzL8tZwvbdhVO4kjQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/0C68318D-A1A3-4CFB-B16E-177D9C564AF1%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to