[influxdb] Full Compaction taking 30s and timing out writes and queries

Jeff Wed, 21 Dec 2016 11:10:09 -0800

Facing an interesting problem with my current InfluxDB single instance 
deployment. I'm running on an 8 core machine with 8GB RAM (physical hardware) 
with InfluxDB v1.1.1 running in a docker container.


I'm writing 520 points in batches of 1560 every 3 seconds to a Retention Policy 
of "1w" with a "1d" shard group duration. Each point contains about 50 fields 
of data. Total in the measurement, there are 115 fields. So for any given 
point, most of the fields are empty, but over a selection of all series, all 
fields are used.

There's 1 tag in the measurement with about 520 series. I've got 1 
ContinuousQuery configured to run every 3 minutes. The CQ is *massive*. It 
looks something like this:
"CREATE CONTINUOUS QUERY "\"3m\"" ON MyDB BEGIN SELECT mean(val1) AS val1, 
mean(val2) AS val2, .... this continues for ALL 115 fields ... INTO 
MyDB."16w".devices FROM MyDB."1w".devices GROUP BY time(3m), device END"

Surprisingly I don't think the CQ is causing too much of a performance issue at 
the moment. Instead what I'm seeing in the influx logs is the following:

Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 beginning 
level 3 compaction of group 0, 4 TSM files
Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 compacting 
level 3 group (0) 
/var/lib/influxdb/data/hostname/1w/1212/000000773-000000003.tsm (#0)
Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 compacting 
level 3 group (0) 
/var/lib/influxdb/data/hostname/1w/1212/000000777-000000003.tsm (#1)
Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 compacting 
level 3 group (0) 
/var/lib/influxdb/data/hostname/1w/1212/000000781-000000003.tsm (#2)
Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 compacting 
level 3 group (0) 
/var/lib/influxdb/data/hostname/1w/1212/000000785-000000003.tsm (#3)
Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted 
level 3 group (0) into 
/var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm.tm
Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted 
level 3 4 files into 1 files in 6.339871251s
Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 beginning 
full compaction of group 0, 2 TSM files
Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacting 
full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000769-000000005.tsm 
(#0)
Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacting 
full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm 
(#1)
Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted 
full group (0) into 
/var/lib/influxdb/data/hostname/1w/1212/000000785-000000005.tsm.tmp (
Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted 
full 2 files into 1 files in 23.549201117s

Not only do those compaction times seem very long (23.5 seconds?) but while 
that full compaction is being performed, I'm getting "timeout" on writes. That 
is, it starts taking longer than 10 seconds (default influx http write timeout) 
for the write to be performed/acknowledged by influx. I've seen the full 
compaction times hover around 30s consistently and seem to happen about once 
every 30 minutes.

The influxDB instance seems to be using all available RAM on the machine. I had 
to cap the docker container at 6GB memory usage in order to not starve the rest 
of the system of resources.

Here's a copy of my logs noting very long write times in conjunction with a 
full compaction occurring on the database:
Process log (write duration is in ms):
Dec 21 12:28:42 hostname process[11361]: 2016-12-21T12:28:42.615Z - warn: db 
long write duration: 9824
Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.106Z - warn: db 
long write duration: 8242
Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.214Z - warn: db 
long write duration: 5260
Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.314Z - warn: db 
long write duration: 2273
Dec 21 12:29:23 hostname process[11361]: 2016-12-21T12:29:23.667Z - warn: db 
long write duration: 5044
Dec 21 12:29:24 hostname process[11361]: 2016-12-21T12:29:24.710Z - warn: db 
long write duration: 3036
Dec 21 12:29:54 hostname process[11361]: 2016-12-21T12:29:54.533Z - warn: db 
long write duration: 2393
Dec 21 12:29:56 hostname process[11361]: 2016-12-21T12:29:56.793Z - warn: db 
long write duration: 1588
Dec 21 12:30:33 hostname process[11361]: 2016-12-21T12:30:33.274Z - warn: db 
long write duration: 1513

Influx log:
Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted 
level 3 group (0) into 
/var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm.tm
Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted 
level 3 8 files into 1 files in 13.399871009s
Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 beginning 
full compaction of group 0, 2 TSM files
Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacting 
full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000513-000000005.tsm 
(#0)
Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacting 
full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm 
(#1)
Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted 
full group (0) into 
/var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm.tmp (
Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted 
full 2 files into 1 files in 21.447891815s
Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 beginning 
full compaction of group 0, 2 TSM files
Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacting 
full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000337-000000006.tsm 
(#0)
Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacting 
full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm 
(#1)
Dec 21 12:29:26 hostname influxdb[3119]: [tsm1] 2016/12/21 12:29:26 Snapshot 
for path /var/lib/influxdb/data/hostname/1w/1212 written in 788.281773ms
Dec 21 12:30:04 hostname influxdb[3119]: [tsm1] 2016/12/21 12:30:04 Snapshot 
for path /var/lib/influxdb/data/hostname/16w/1213 written in 985.274321ms

Is there anything I can do to help these compaction times be shorter? Would 
having smaller shard groups (maybe 1h instead of 1d) help? Is the sheer number 
of fields causing a problem? I could potentially break up the measurement into 
multiple such that no one measurement has more than about 50 fields.

Thanks for any suggestions!

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/7c735db5-5c40-43ab-946a-f0a98a231adf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Full Compaction taking 30s and timing out writes and queries

Reply via email to