It looks like your CQ does lead to RAM spikes close to the capacity of the box. Your shard durations are what's tipping the issue, I believe. With 1 day shards in a 90 day retention policy, there are a lot of housecleaning tasks to do each night at midnight UTC. When each shard expires, the series index has to be updated and a series of compactions kick off. Compactions are RAM and CPU intensive.
First recommendation, use ALTER RETENTION POLICY to raise the shard duration for `three_months` to at least a week, but even a month would be good. It will reduce the frequency of the TSM compactions, and with fewer files the compactions will be less resource intensive. Also, queries should touch as few shards as possible. If you are often querying for more than 12 hours of data then raising the shard duration will reduce the RAM needs of those queries. On Fri, Oct 14, 2016 at 3:29 AM, <[email protected]> wrote: > Hi Sean, > > here is the graph from out NMS about memory usage > > https://s18.postimg.org/a6buyzna1/memory.png > > and I would say we have spikes, but they are not every 24h but rather > every 30 minutes and I guess it's because of our CQ we use for > downsampling. I can post that CQ if that can help. > > We are aware of cardinality when we designed our solution and currently we > have 81761 series which I guess it's quite ok for this amount of RAM. > > > SHOW RETENTION POLICIES ON macdb > name duration shardGroupDuration replicaN > default > default 0 168h0m0s 1 > false > seven_days 168h0m0s 24h0m0s 1 > true > three_months 2160h0m0s 24h0m0s 1 > false > > > Yes, we always have successful writes and we have 204 response returned by > InfluxDB. By missing the whole measurement I mean that for example we have > one measurement at 10:00 pm in InfluxDB and in our file (we do write data > in file for debugging), then we have next measurement in 10:05 pm in both > InfluxDB and file, the next measurement in 10:10 we are missing in InfluxDB > but we do have that measurement in file and still we have 204 response > returned by InfluxDB. > > -- > Remember to include the version number! > --- > You received this message because you are subscribed to the Google Groups > "InfluxData" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/influxdb/ba384d32-335f-4db8-b4f9-f43b584dba25%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Sean Beckett Director of Support and Professional Services InfluxDB -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CALGqCvMk%2BgtmKBqYTtq7vZWUCNb_DWSw4BM5aZO5usKi3pevog%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
