On Thursday, June 16, 2016 at 6:16:46 PM UTC-4, Sean Beckett wrote: > On Thu, Jun 16, 2016 at 2:14 PM, UW <[email protected]> wrote: > Hello, we are collecting data from hundreds of clients and each data > collection period lasts several weeks. After three months we would like to > move older data to the archive. > > > > Questions: > > > > 1. Since there will be no queries across clients, does it make sense to set > up a separate DB to encapsulate data for each client? > > > > Probably not. That will lead to a lot of duplicated index for little gain. It > sounds to me like client should be a tag, not a database or measurement. > > 2. What is the performance impact of having hundreds or thousands of DBs and > are there any scalability guidelines for this? > > > > Each database must have its own index. If you have one database with 5 tags, > each with 2 fully independent values, you've got 25 unique series. If you > have 100 databases, each with the identical 5 tags, 2 values each, now you > have 2500 unique series. Since you're talking about thousands of databases > that leads to a much higher series cardinality, which strongly impacts RAM > needs. In addition, the query engine has to maintain some working RAM for > every database. That adds up when you're talking about thousands of DBs. > > > Also, points are stored in files per retention policy, per database. If you > have 1000 databases with just 1 point each, that still means 1000 shard files > on disk. It's not a huge issue but it does affect performance. > > 3. If a DB is moved to archive, how quickly can it be re-mounted if data > needs to be analyzed again? > > > > InfluxDB does not yet support multiple file paths for data. All the data is > stored in one place, and there is no concept of warm or cold storage. All > data is always accessible. > > > InfluxDB does support automated expiry of data, and also automated > downsampling of high precision in to lower precision data. Does that meet > your needs? > > > InfluxDB is very space efficient, so I'm not sure there's any reason to want > to archive the data. Each numeric value takes less than 2 bytes on disk when > fully compacted, so unless each client is storing billions of points you > should be fine for space on even a small SSD. > > > > Thank you. > > > > Mark > > > > -- > > Remember to include the InfluxDB version number with all issue reports > > --- > > You received this message because you are subscribed to the Google Groups > "InfluxDB" group. > > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > To post to this group, send email to [email protected]. > > Visit this group at https://groups.google.com/group/influxdb. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/influxdb/7316123c-846f-4c4b-ba5d-4c7ccf09bc39%40googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > > > -- > > > Sean Beckett > Director of Support and Professional Services > InfluxDB
Sean, thank you for a quick response. I have three follow-up questions: 1. How much RAM is required for each working database? 2. How does the number of shard files affect performance? 3. If we use a single database, how difficult would it be to purge data relating to a particular tag (e.g. client)? Thank you. Mark -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/1e7f86a7-4214-4864-ba1a-0a87e435a80f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
