Thanks Sean, I appreciate your help. I did write a quick python script as a proof of concept yesterday, and it seems to work ok. I was a little surprised by how long this approach will take, but I haven't yet profiled the code to see where my time is spent. Before I go much further I'd like to verify exactly how data is stored on disk and how the data is sharded.
During this migration I'm consolidating a number of measurements, with part of the metric name becoming an additional tag. The influxdb-python result set returns query results as a list of series. Due to that structure I am writing batches of points in chronological order, but repeating that same chronological period roughly 50 times across different series within the measurement. Is this a valid approach, or should I interleave the points chronologically, only passing over the same time period once? > On Jul 10, 2016, at 9:52 PM, Sean Beckett <[email protected]> wrote: > > When writing older data, as long as you submit batches of points with > timestamps in chronological order, it should still be fairly performant. > Backfilling the downsampling with the INTO clause should also be performant > provided you restrict the time ranges to a perhaps a week at a time. > > We have an internal tool for querying data and replaying it into another > instance. I'll see if that's something we can release, or if we can point you > to the right code. > > On Sat, Jul 9, 2016 at 4:20 PM, Ryan January <[email protected] > <mailto:[email protected]>> wrote: > I currently have data I'm trying to migrate from one InfluxDB instance to a > newer installation and not sure of the proper way to do so. > As a bit of background: We're sending app metrics to statsd, which are > passing through telegraf to be persisted in InfluxDB on 10 second intervals. > During the migration to the new server I'm taking the opportunity to > restructure the schema to account for some shortcomings during the first > install. Due to this restructuring I can't perform a simple backup/restore. > > The old server is .11, new is currently using .13, both using RHEL > > I'm writing a fairly consistent 20 points per second (total) across 50 > measurements, each having roughly 130 series. > Each measurement has 3 tags and 6 fields. > No retention policy is currently in place. > The database is approximately 2 gb and covers a timespan of roughly 1 year. > > The new server will store 10s resolution in a 60 day retention policy, the > data will also be downsampled to 1m resolution into a 1 year retention > policy. I'd like to migrate the current 10s samples into the new server, and > 1 year retention policy. > I'm planning to backfill the old data after the apps have begun sending > metrics to the new server. > > My current plan is to write a script to query 1 minute aggregate data from > the old DB in 1 hour chunks, rewriting that data to the new DB. > The little I know about the TSM engine is from videos posted on the > influxdata site. It stated that InfluxDB was optimized around the thought > that we're only modifying recent data, and that there may be a write penalty > as older shards are rewritten. > > How much of an issue is this if no data currently exists in those timeframes? > Are there other methods of backfill that I should consider? > > Thank you, > Ryan > > > > > -- > Remember to include the InfluxDB version number with all issue reports > --- > You received this message because you are subscribed to the Google Groups > "InfluxDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at https://groups.google.com/group/influxdb > <https://groups.google.com/group/influxdb>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/influxdb/50701a8b-5a8e-4abd-bf60-7dcecb2cf65d%40googlegroups.com > > <https://groups.google.com/d/msgid/influxdb/50701a8b-5a8e-4abd-bf60-7dcecb2cf65d%40googlegroups.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > > > -- > Sean Beckett > Director of Support and Professional Services > InfluxDB > > -- > Remember to include the InfluxDB version number with all issue reports > --- > You received this message because you are subscribed to a topic in the Google > Groups "InfluxDB" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/influxdb/A9WSWOdorxY/unsubscribe > <https://groups.google.com/d/topic/influxdb/A9WSWOdorxY/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at https://groups.google.com/group/influxdb > <https://groups.google.com/group/influxdb>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/influxdb/CALGqCvOe5t%2BXN8cP4vfjSi6ZyqHOa3uAwhzL8tZwvbdhVO4kjQ%40mail.gmail.com > > <https://groups.google.com/d/msgid/influxdb/CALGqCvOe5t%2BXN8cP4vfjSi6ZyqHOa3uAwhzL8tZwvbdhVO4kjQ%40mail.gmail.com?utm_medium=email&utm_source=footer>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/0C68318D-A1A3-4CFB-B16E-177D9C564AF1%40gmail.com. For more options, visit https://groups.google.com/d/optout.
