Re: [influxdb] Full Compaction taking 30s and timing out writes and queries

jwheeler Tue, 27 Dec 2016 22:15:47 -0800

Do the values need to be in alphabetical order as well? I'm now sorting tags 
alphabetically and I'm writing about 47k values per second in 5 post groups. 
Unfortunately I still occasionally see timeouts on the writes. And when those 
write timeouts start occurring, my queries time out as well. Is there anything 
else I could try adjusting?


Thanks,
Jeff

On Tuesday, December 27, 2016 at 3:21:02 PM UTC-6, Paul Dix wrote:
> It shouldn't make a huge difference, but yeah, they should be in key 
> alphabetical order. Otherwise the server ends up having to sort them before 
> writing.
> 
> 
> On Tue, Dec 27, 2016 at 4:18 PM,  <[email protected]> wrote:
> The data should always be in correct time order. I do not believe tags are in 
> alphabetical order though. I wasn't aware that would make any difference. 
> I'll give that a shot next.
> 
> 
> 
> On Tuesday, December 27, 2016 at 2:45:52 PM UTC-6, Paul Dix wrote:
> 
> > You could break them down into 8 POSTs per second. To be honest I'm not 
> > sure why you're getting timeouts. We test significantly higher load 
> > regularly on more modest hardware (VMs in AWS). Any chance you're writing 
> > data out of time order? Are you posting the tags in alphabetical order?
> 
> >
> 
> >
> 
> 
> 
> > On Mon, Dec 26, 2016 at 2:57 PM,  <[email protected]> wrote:
> 
> > So just to make sure, if I'm writing 80k values per second, I want to 
> > accomplish that by making 80 POSTs per second with about 1k values per post.
> 
> >
> 
> >
> 
> >
> 
> > The process generating data right now is node.js which is handling all 
> > posts asynchronously, but the process itself is single threaded (the posts 
> > will all go out sequentially in a loop but in non-blocking fashion).
> 
> >
> 
> >
> 
> >
> 
> > Even with this setup though, I still occasionally get "timeout" error 
> > responses back from influx for the posts (which happens when influx doesn't 
> > close out the post request within 10 seconds). If I'm not mistaken, that 
> > error occurs on the influxDB end and not the client end, correct? I think 
> > this is the "write-timeout" setting in the coordinator config section which 
> > is documented as:
> 
> >
> 
> > "The time within which a write request must complete on the cluster."
> 
> >
> 
> > I don't know if this applies to a standalone configuration of influx or not 
> > though.
> 
> >
> 
> >
> 
> >
> 
> > I'll continue trying to adjust the WAL cache size settings and see if I 
> > can't find what magic values will get it to stop having timeout issues.
> 
> >
> 
> >
> 
> >
> 
> > Thanks,
> 
> >
> 
> > Jeff
> 
> >
> 
> >
> 
> >
> 
> > On Monday, December 26, 2016 at 9:51:06 AM UTC-6, Paul Dix wrote:
> 
> >
> 
> > > 80k values per second should be no problem. We regularly test at > 800k 
> > > values/sec and what's on master now will do ~2M values/sec if you're on a 
> > > large enough box.
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > You should be posting 1k-2k values per post, but have multiple threads or 
> > > processes doing it. Concurrency is the key. The total number of 
> > > values/sec shouldn't be a problem on your hardware (assuming you're doing 
> > > < 100k values/sec).
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> >
> 
> >
> 
> > > On Sat, Dec 24, 2016 at 10:00 AM,  <[email protected]> wrote:
> 
> >
> 
> > > Is there any kind of guide on how I should be sizing these numbers?
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > I tried doubling the cache-memory-max-size and quadrupling the 
> > > cache-snapshot-memory-size. I also tried writing fewer values per 
> > > request, but that didn't really seem to help.
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > I tried switching to writing only 1k, 5k, 8k, 10k, 20k, and 80k values 
> > > per request per second. I had my process log out a warning any time 
> > > requests took longer than one second to complete since I'm generating 
> > > data at a rate of once per second. This resulted in anywhere from 1 to 50 
> > > posts per second (I had overestimated the number of values per second I 
> > > was writing - it's actually somewhere around 50k). It seemed like 
> > > somewhere around 30k values per request actually worked best. I'd very 
> > > frequently see multiple requests take longer than 1s when sending 50 
> > > posts with 1k values per second. With 30k values per post, the individual 
> > > posts would take about 1-3 seconds to complete but seemed to "catch up" 
> > > every now and then. That is, it could go for up to 10 seconds without 
> > > seeing a request take longer than 1s to complete whereas with 50 posts 
> > > per second, I'd always see at least a handful of posts take longer than 
> > > 1s in every group.
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > I did still see timeout errors with this configuration:
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:37 process [27383]: 2016-12-24T10:12:37.662Z - error: 
> > > {"error":"timeout"}
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:37 process [27383]: 2016-12-24T10:12:37.671Z - warn: db long 
> > > write duration: 10037
> 
> >
> 
> > >
> 
> >
> 
> > > ... (there were about 7 of these in a row, all taking longer than 10s to 
> > > complete and timing out)
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > And this is the compaction log from influx around the same time:
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 
> > > compacted full group (0) into 
> > > /var/lib/influxdb/data/host/1w/1224/000000689-000000005.tsm.tmp (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 
> > > compacted full 4 files into 1 files in 2m38.182651921s
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 
> > > beginning full compaction of group 0, 2 TSM files
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 
> > > compacting full group (0) 
> > > /var/lib/influxdb/data/host/1w/1224/000000496-000000006.tsm (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 
> > > compacting full group (0) 
> > > /var/lib/influxdb/data/host/1w/1224/000000689-000000005.tsm (#1)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 
> > > beginning level 1 compaction of group 0, 6 TSM files
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 
> > > compacting level 1 group (0) 
> > > /var/lib/influxdb/data/host/1w/1224/000000690-000000001.tsm (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 
> > > compacting level 1 group (0) 
> > > /var/lib/influxdb/data/host/1w/1224/000000690-000000001.tsm (#1)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 
> > > compacting level 1 group (0) 
> > > /var/lib/influxdb/data/host/1w/1224/000000690-000000001.tsm (#2)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 
> > > compacting level 1 group (0) 
> > > /var/lib/influxdb/data/host/1w/1224/000000691-000000001.tsm (#3)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 
> > > compacting level 1 group (0) 
> > > /var/lib/influxdb/data/host/1w/1224/000000691-000000001.tsm (#4)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 
> > > compacting level 1 group (0) 
> > > /var/lib/influxdb/data/host/1w/1224/000000691-000000001.tsm (#5)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:16 host influxdb[26479]: [tsm1] 2016/12/24 10:12:16 Snapshot 
> > > for path /var/lib/influxdb/data/host/1w/1224 written in 4.217841051s
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:26 host influxdb[26479]: [tsm1] 2016/12/24 10:12:26 
> > > compacted level 1 group (0) into 
> > > /var/lib/influxdb/data/host/1w/1224/000000691-000000002.tsm.tmp (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:12:26 host influxdb[26479]: [tsm1] 2016/12/24 10:12:26 
> > > compacted level 1 6 files into 1 files in 10.458949133s
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:14:22 host influxdb[26479]: [tsm1] 2016/12/24 10:14:22 
> > > compacted full group (0) into 
> > > /var/lib/influxdb/data/host/1w/1224/000000689-000000006.tsm.tmp (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > Dec 24 10:14:22 host influxdb[26479]: [tsm1] 2016/12/24 10:14:22 
> > > compacted full 2 files into 1 files in 4m11.403244468s
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > Compaction times have definitely gone up there at 4 minutes and 11 
> > > seconds.
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > Do you have any further suggestions of how I can "tune" influx for 
> > > handling this large and fast volume of writes? I could send posts less 
> > > frequently, but it's still the same amount of data. So if I did posts 
> > > every 3 seconds, I would have to send 3x the number of requests every 3 
> > > seconds.
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > Most queries that run against the DB are for realtime charts (similar to 
> > > grafana) which are displaying a 5 or 10 minute window of 1s data for a 
> > > small number of values and tags. These queries seem to be pretty 
> > > performant (only taking about 70ms for a batch of 5 queries).
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > I'm still not seeing any bottlenecks in terms of memory or CPU (as in I 
> > > never see either of them really spike or max out). The harddrive is a 
> > > modern SSD and we recently increased the RAM to 16GB. I'm not sure what's 
> > > causing the long write times, or if it's just a combination of queries, 
> > > the continuous query, and compaction that's giving it a hard time.
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > Thanks again for the help so far!
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > On Thursday, December 22, 2016 at 9:45:48 AM UTC-6, Paul Dix wrote:
> 
> >
> 
> > >
> 
> >
> 
> > > > You might try breaking it up further. We generally do performance tests 
> > > > with 1k-10k values per request. You can set the WAL snapshotting sizes 
> > > > here:
> 
> >
> 
> > >
> 
> >
> 
> > > > https://github.com/influxdata/influxdb/blob/master/etc/config.sample.toml#L62-L68
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > > On Wed, Dec 21, 2016 at 2:40 PM,  <[email protected]> wrote:
> 
> >
> 
> > >
> 
> >
> 
> > > > I'm writing approximately 78,000 values per request (about 50 values 
> > > > per point with 1560 points every 3 seconds). I saw similar behavior 
> > > > when writing 26,000 values per request every 1 second.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > Should I try breaking those up into smaller writes instead of larger 
> > > > ones?
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > How can I adjust the max WAL cache size? I don't see that as an 
> > > > available configuration option in v1.1:
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > https://docs.influxdata.com/influxdb/v1.1/administration/config#environment-variables
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > Thanks!
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > On Wednesday, December 21, 2016 at 11:20:46 AM UTC-8, Paul Dix wrote:
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Compactions shouldn't cause write timeouts. I would suspect that 
> > > > > write timeouts are happening because you're posting too many values 
> > > > > per request. You can also try increasing the max WAL cache size.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > How many actual values are you writing per request? That is field 
> > > > > values. For example:
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > cpu,host=serverA usage_user=23,usage_system=5
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Represents 2 values posted, not one. That might help narrow things 
> > > > > down.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > On Wed, Dec 21, 2016 at 1:09 PM, Jeff <[email protected]> 
> > > > > wrote:
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Facing an interesting problem with my current InfluxDB single 
> > > > > instance deployment. I'm running on an 8 core machine with 8GB RAM 
> > > > > (physical hardware) with InfluxDB v1.1.1 running in a docker 
> > > > > container.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > I'm writing 520 points in batches of 1560 every 3 seconds to a 
> > > > > Retention Policy of "1w" with a "1d" shard group duration. Each point 
> > > > > contains about 50 fields of data. Total in the measurement, there are 
> > > > > 115 fields. So for any given point, most of the fields are empty, but 
> > > > > over a selection of all series, all fields are used.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > There's 1 tag in the measurement with about 520 series. I've got 1 
> > > > > ContinuousQuery configured to run every 3 minutes. The CQ is 
> > > > > *massive*. It looks something like this:
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > "CREATE CONTINUOUS QUERY "\"3m\"" ON MyDB BEGIN SELECT mean(val1) AS 
> > > > > val1, mean(val2) AS val2, .... this continues for ALL 115 fields ... 
> > > > > INTO MyDB."16w".devices FROM MyDB."1w".devices GROUP BY time(3m), 
> > > > > device END"
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Surprisingly I don't think the CQ is causing too much of a 
> > > > > performance issue at the moment. Instead what I'm seeing in the 
> > > > > influx logs is the following:
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> > > > > beginning level 3 compaction of group 0, 4 TSM files
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> > > > > compacting level 3 group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000773-000000003.tsm (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> > > > > compacting level 3 group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000777-000000003.tsm (#1)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> > > > > compacting level 3 group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000781-000000003.tsm (#2)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> > > > > compacting level 3 group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000785-000000003.tsm (#3)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> > > > > compacted level 3 group (0) into 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm.tm
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> > > > > compacted level 3 4 files into 1 files in 6.339871251s
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> > > > > beginning full compaction of group 0, 2 TSM files
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> > > > > compacting full group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000769-000000005.tsm (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> > > > > compacting full group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm (#1)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 
> > > > > compacted full group (0) into 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000785-000000005.tsm.tmp (
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 
> > > > > compacted full 2 files into 1 files in 23.549201117s
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Not only do those compaction times seem very long (23.5 seconds?) but 
> > > > > while that full compaction is being performed, I'm getting "timeout" 
> > > > > on writes. That is, it starts taking longer than 10 seconds (default 
> > > > > influx http write timeout) for the write to be performed/acknowledged 
> > > > > by influx. I've seen the full compaction times hover around 30s 
> > > > > consistently and seem to happen about once every 30 minutes.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > The influxDB instance seems to be using all available RAM on the 
> > > > > machine. I had to cap the docker container at 6GB memory usage in 
> > > > > order to not starve the rest of the system of resources.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Here's a copy of my logs noting very long write times in conjunction 
> > > > > with a full compaction occurring on the database:
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Process log (write duration is in ms):
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:42 hostname process[11361]: 2016-12-21T12:28:42.615Z - 
> > > > > warn: db long write duration: 9824
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.106Z - 
> > > > > warn: db long write duration: 8242
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.214Z - 
> > > > > warn: db long write duration: 5260
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.314Z - 
> > > > > warn: db long write duration: 2273
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:29:23 hostname process[11361]: 2016-12-21T12:29:23.667Z - 
> > > > > warn: db long write duration: 5044
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:29:24 hostname process[11361]: 2016-12-21T12:29:24.710Z - 
> > > > > warn: db long write duration: 3036
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:29:54 hostname process[11361]: 2016-12-21T12:29:54.533Z - 
> > > > > warn: db long write duration: 2393
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:29:56 hostname process[11361]: 2016-12-21T12:29:56.793Z - 
> > > > > warn: db long write duration: 1588
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:30:33 hostname process[11361]: 2016-12-21T12:30:33.274Z - 
> > > > > warn: db long write duration: 1513
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Influx log:
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> > > > > compacted level 3 group (0) into 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm.tm
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> > > > > compacted level 3 8 files into 1 files in 13.399871009s
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> > > > > beginning full compaction of group 0, 2 TSM files
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> > > > > compacting full group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000513-000000005.tsm (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> > > > > compacting full group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm (#1)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> > > > > compacted full group (0) into 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm.tmp (
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> > > > > compacted full 2 files into 1 files in 21.447891815s
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> > > > > beginning full compaction of group 0, 2 TSM files
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> > > > > compacting full group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000337-000000006.tsm (#0)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> > > > > compacting full group (0) 
> > > > > /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm (#1)
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:29:26 hostname influxdb[3119]: [tsm1] 2016/12/21 12:29:26 
> > > > > Snapshot for path /var/lib/influxdb/data/hostname/1w/1212 written in 
> > > > > 788.281773ms
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Dec 21 12:30:04 hostname influxdb[3119]: [tsm1] 2016/12/21 12:30:04 
> > > > > Snapshot for path /var/lib/influxdb/data/hostname/16w/1213 written in 
> > > > > 985.274321ms
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Is there anything I can do to help these compaction times be shorter? 
> > > > > Would having smaller shard groups (maybe 1h instead of 1d) help? Is 
> > > > > the sheer number of fields causing a problem? I could potentially 
> > > > > break up the measurement into multiple such that no one measurement 
> > > > > has more than about 50 fields.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Thanks for any suggestions!
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > --
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Remember to include the version number!
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > ---
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > You received this message because you are subscribed to the Google 
> > > > > Groups "InfluxData" group.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > To unsubscribe from this group and stop receiving emails from it, 
> > > > > send an email to [email protected].
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > To post to this group, send email to [email protected].
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > Visit this group at https://groups.google.com/group/influxdb.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > To view this discussion on the web visit 
> > > > > https://groups.google.com/d/msgid/influxdb/7c735db5-5c40-43ab-946a-f0a98a231adf%40googlegroups.com.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > > For more options, visit https://groups.google.com/d/optout.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > --
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > Remember to include the version number!
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > ---
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > You received this message because you are subscribed to the Google 
> > > > Groups "InfluxData" group.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > To unsubscribe from this group and stop receiving emails from it, send 
> > > > an email to [email protected].
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > To post to this group, send email to [email protected].
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > Visit this group at https://groups.google.com/group/influxdb.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > To view this discussion on the web visit 
> > > > https://groups.google.com/d/msgid/influxdb/a55751cd-4ec2-4cd2-9e15-d0984e4b23d1%40googlegroups.com.
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > >
> 
> >
> 
> > >
> 
> >
> 
> > > > For more options, visit https://groups.google.com/d/optout.
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > --
> 
> >
> 
> > >
> 
> >
> 
> > > Remember to include the version number!
> 
> >
> 
> > >
> 
> >
> 
> > > ---
> 
> >
> 
> > >
> 
> >
> 
> > > You received this message because you are subscribed to the Google Groups 
> > > "InfluxData" group.
> 
> >
> 
> > >
> 
> >
> 
> > > To unsubscribe from this group and stop receiving emails from it, send an 
> > > email to [email protected].
> 
> >
> 
> > >
> 
> >
> 
> > > To post to this group, send email to [email protected].
> 
> >
> 
> > >
> 
> >
> 
> > > Visit this group at https://groups.google.com/group/influxdb.
> 
> >
> 
> > >
> 
> >
> 
> > > To view this discussion on the web visit 
> > > https://groups.google.com/d/msgid/influxdb/3559f0e0-3c8b-44c5-9750-90460e69473e%40googlegroups.com.
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > >
> 
> >
> 
> > > For more options, visit https://groups.google.com/d/optout.
> 
> >
> 
> >
> 
> >
> 
> > --
> 
> >
> 
> > Remember to include the version number!
> 
> >
> 
> > ---
> 
> >
> 
> > You received this message because you are subscribed to the Google Groups 
> > "InfluxData" group.
> 
> >
> 
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected].
> 
> >
> 
> > To post to this group, send email to [email protected].
> 
> >
> 
> > Visit this group at https://groups.google.com/group/influxdb.
> 
> >
> 
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/influxdb/5f49434a-5783-4935-bdf1-323ab13c8f7d%40googlegroups.com.
> 
> >
> 
> >
> 
> >
> 
> > For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> --
> 
> Remember to include the version number!
> 
> ---
> 
> You received this message because you are subscribed to the Google Groups 
> "InfluxData" group.
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> 
> To post to this group, send email to [email protected].
> 
> Visit this group at https://groups.google.com/group/influxdb.
> 
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/influxdb/316cd73b-f57f-4212-b59e-131d861397b2%40googlegroups.com.
> 
> 
> 
> For more options, visit https://groups.google.com/d/optout.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/934e6cb8-089c-470c-8101-14f275d41808%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Full Compaction taking 30s and timing out writes and queries

Reply via email to