Facing an interesting problem with my current InfluxDB single instance deployment. I'm running on an 8 core machine with 8GB RAM (physical hardware) with InfluxDB v1.1.1 running in a docker container.
I'm writing 520 points in batches of 1560 every 3 seconds to a Retention Policy of "1w" with a "1d" shard group duration. Each point contains about 50 fields of data. Total in the measurement, there are 115 fields. So for any given point, most of the fields are empty, but over a selection of all series, all fields are used. There's 1 tag in the measurement with about 520 series. I've got 1 ContinuousQuery configured to run every 3 minutes. The CQ is *massive*. It looks something like this: "CREATE CONTINUOUS QUERY "\"3m\"" ON MyDB BEGIN SELECT mean(val1) AS val1, mean(val2) AS val2, .... this continues for ALL 115 fields ... INTO MyDB."16w".devices FROM MyDB."1w".devices GROUP BY time(3m), device END" Surprisingly I don't think the CQ is causing too much of a performance issue at the moment. Instead what I'm seeing in the influx logs is the following: Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 beginning level 3 compaction of group 0, 4 TSM files Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 compacting level 3 group (0) /var/lib/influxdb/data/hostname/1w/1212/000000773-000000003.tsm (#0) Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 compacting level 3 group (0) /var/lib/influxdb/data/hostname/1w/1212/000000777-000000003.tsm (#1) Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 compacting level 3 group (0) /var/lib/influxdb/data/hostname/1w/1212/000000781-000000003.tsm (#2) Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 compacting level 3 group (0) /var/lib/influxdb/data/hostname/1w/1212/000000785-000000003.tsm (#3) Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted level 3 group (0) into /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm.tm Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted level 3 4 files into 1 files in 6.339871251s Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 beginning full compaction of group 0, 2 TSM files Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacting full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000769-000000005.tsm (#0) Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacting full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm (#1) Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted full group (0) into /var/lib/influxdb/data/hostname/1w/1212/000000785-000000005.tsm.tmp ( Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted full 2 files into 1 files in 23.549201117s Not only do those compaction times seem very long (23.5 seconds?) but while that full compaction is being performed, I'm getting "timeout" on writes. That is, it starts taking longer than 10 seconds (default influx http write timeout) for the write to be performed/acknowledged by influx. I've seen the full compaction times hover around 30s consistently and seem to happen about once every 30 minutes. The influxDB instance seems to be using all available RAM on the machine. I had to cap the docker container at 6GB memory usage in order to not starve the rest of the system of resources. Here's a copy of my logs noting very long write times in conjunction with a full compaction occurring on the database: Process log (write duration is in ms): Dec 21 12:28:42 hostname process[11361]: 2016-12-21T12:28:42.615Z - warn: db long write duration: 9824 Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.106Z - warn: db long write duration: 8242 Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.214Z - warn: db long write duration: 5260 Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.314Z - warn: db long write duration: 2273 Dec 21 12:29:23 hostname process[11361]: 2016-12-21T12:29:23.667Z - warn: db long write duration: 5044 Dec 21 12:29:24 hostname process[11361]: 2016-12-21T12:29:24.710Z - warn: db long write duration: 3036 Dec 21 12:29:54 hostname process[11361]: 2016-12-21T12:29:54.533Z - warn: db long write duration: 2393 Dec 21 12:29:56 hostname process[11361]: 2016-12-21T12:29:56.793Z - warn: db long write duration: 1588 Dec 21 12:30:33 hostname process[11361]: 2016-12-21T12:30:33.274Z - warn: db long write duration: 1513 Influx log: Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted level 3 group (0) into /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm.tm Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted level 3 8 files into 1 files in 13.399871009s Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 beginning full compaction of group 0, 2 TSM files Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacting full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000513-000000005.tsm (#0) Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacting full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm (#1) Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted full group (0) into /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm.tmp ( Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted full 2 files into 1 files in 21.447891815s Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 beginning full compaction of group 0, 2 TSM files Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacting full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000337-000000006.tsm (#0) Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacting full group (0) /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm (#1) Dec 21 12:29:26 hostname influxdb[3119]: [tsm1] 2016/12/21 12:29:26 Snapshot for path /var/lib/influxdb/data/hostname/1w/1212 written in 788.281773ms Dec 21 12:30:04 hostname influxdb[3119]: [tsm1] 2016/12/21 12:30:04 Snapshot for path /var/lib/influxdb/data/hostname/16w/1213 written in 985.274321ms Is there anything I can do to help these compaction times be shorter? Would having smaller shard groups (maybe 1h instead of 1d) help? Is the sheer number of fields causing a problem? I could potentially break up the measurement into multiple such that no one measurement has more than about 50 fields. Thanks for any suggestions! -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/7c735db5-5c40-43ab-946a-f0a98a231adf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
