Re: [influxdb] Full Compaction taking 30s and timing out writes and queries

jwheeler Wed, 21 Dec 2016 12:41:15 -0800

I'm writing approximately 78,000 values per request (about 50 values per point 
with 1560 points every 3 seconds). I saw similar behavior when writing 26,000 
values per request every 1 second.


Should I try breaking those up into smaller writes instead of larger ones?

How can I adjust the max WAL cache size? I don't see that as an available 
configuration option in v1.1:
https://docs.influxdata.com/influxdb/v1.1/administration/config#environment-variables

Thanks!

On Wednesday, December 21, 2016 at 11:20:46 AM UTC-8, Paul Dix wrote:
> Compactions shouldn't cause write timeouts. I would suspect that write 
> timeouts are happening because you're posting too many values per request. 
> You can also try increasing the max WAL cache size.
> 
> 
> How many actual values are you writing per request? That is field values. For 
> example:
> 
> 
> cpu,host=serverA usage_user=23,usage_system=5
> 
> 
> Represents 2 values posted, not one. That might help narrow things down.
> 
> 
> On Wed, Dec 21, 2016 at 1:09 PM, Jeff <[email protected]> wrote:
> Facing an interesting problem with my current InfluxDB single instance 
> deployment. I'm running on an 8 core machine with 8GB RAM (physical hardware) 
> with InfluxDB v1.1.1 running in a docker container.
> 
> 
> 
> I'm writing 520 points in batches of 1560 every 3 seconds to a Retention 
> Policy of "1w" with a "1d" shard group duration. Each point contains about 50 
> fields of data. Total in the measurement, there are 115 fields. So for any 
> given point, most of the fields are empty, but over a selection of all 
> series, all fields are used.
> 
> 
> 
> There's 1 tag in the measurement with about 520 series. I've got 1 
> ContinuousQuery configured to run every 3 minutes. The CQ is *massive*. It 
> looks something like this:
> 
> "CREATE CONTINUOUS QUERY "\"3m\"" ON MyDB BEGIN SELECT mean(val1) AS val1, 
> mean(val2) AS val2, .... this continues for ALL 115 fields ... INTO 
> MyDB."16w".devices FROM MyDB."1w".devices GROUP BY time(3m), device END"
> 
> 
> 
> Surprisingly I don't think the CQ is causing too much of a performance issue 
> at the moment. Instead what I'm seeing in the influx logs is the following:
> 
> 
> 
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 beginning 
> level 3 compaction of group 0, 4 TSM files
> 
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> compacting level 3 group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000773-000000003.tsm (#0)
> 
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> compacting level 3 group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000777-000000003.tsm (#1)
> 
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> compacting level 3 group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000781-000000003.tsm (#2)
> 
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> compacting level 3 group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000785-000000003.tsm (#3)
> 
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted 
> level 3 group (0) into 
> /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm.tm
> 
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted 
> level 3 4 files into 1 files in 6.339871251s
> 
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 beginning 
> full compaction of group 0, 2 TSM files
> 
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000769-000000005.tsm (#0)
> 
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm (#1)
> 
> Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted 
> full group (0) into 
> /var/lib/influxdb/data/hostname/1w/1212/000000785-000000005.tsm.tmp (
> 
> Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted 
> full 2 files into 1 files in 23.549201117s
> 
> 
> 
> Not only do those compaction times seem very long (23.5 seconds?) but while 
> that full compaction is being performed, I'm getting "timeout" on writes. 
> That is, it starts taking longer than 10 seconds (default influx http write 
> timeout) for the write to be performed/acknowledged by influx. I've seen the 
> full compaction times hover around 30s consistently and seem to happen about 
> once every 30 minutes.
> 
> 
> 
> The influxDB instance seems to be using all available RAM on the machine. I 
> had to cap the docker container at 6GB memory usage in order to not starve 
> the rest of the system of resources.
> 
> 
> 
> Here's a copy of my logs noting very long write times in conjunction with a 
> full compaction occurring on the database:
> 
> Process log (write duration is in ms):
> 
> Dec 21 12:28:42 hostname process[11361]: 2016-12-21T12:28:42.615Z - warn: db 
> long write duration: 9824
> 
> Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.106Z - warn: db 
> long write duration: 8242
> 
> Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.214Z - warn: db 
> long write duration: 5260
> 
> Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.314Z - warn: db 
> long write duration: 2273
> 
> Dec 21 12:29:23 hostname process[11361]: 2016-12-21T12:29:23.667Z - warn: db 
> long write duration: 5044
> 
> Dec 21 12:29:24 hostname process[11361]: 2016-12-21T12:29:24.710Z - warn: db 
> long write duration: 3036
> 
> Dec 21 12:29:54 hostname process[11361]: 2016-12-21T12:29:54.533Z - warn: db 
> long write duration: 2393
> 
> Dec 21 12:29:56 hostname process[11361]: 2016-12-21T12:29:56.793Z - warn: db 
> long write duration: 1588
> 
> Dec 21 12:30:33 hostname process[11361]: 2016-12-21T12:30:33.274Z - warn: db 
> long write duration: 1513
> 
> 
> 
> Influx log:
> 
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted 
> level 3 group (0) into 
> /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm.tm
> 
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted 
> level 3 8 files into 1 files in 13.399871009s
> 
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 beginning 
> full compaction of group 0, 2 TSM files
> 
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000513-000000005.tsm (#0)
> 
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm (#1)
> 
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted 
> full group (0) into 
> /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm.tmp (
> 
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted 
> full 2 files into 1 files in 21.447891815s
> 
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 beginning 
> full compaction of group 0, 2 TSM files
> 
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000337-000000006.tsm (#0)
> 
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm (#1)
> 
> Dec 21 12:29:26 hostname influxdb[3119]: [tsm1] 2016/12/21 12:29:26 Snapshot 
> for path /var/lib/influxdb/data/hostname/1w/1212 written in 788.281773ms
> 
> Dec 21 12:30:04 hostname influxdb[3119]: [tsm1] 2016/12/21 12:30:04 Snapshot 
> for path /var/lib/influxdb/data/hostname/16w/1213 written in 985.274321ms
> 
> 
> 
> Is there anything I can do to help these compaction times be shorter? Would 
> having smaller shard groups (maybe 1h instead of 1d) help? Is the sheer 
> number of fields causing a problem? I could potentially break up the 
> measurement into multiple such that no one measurement has more than about 50 
> fields.
> 
> 
> 
> Thanks for any suggestions!
> 
> 
> 
> --
> 
> Remember to include the version number!
> 
> ---
> 
> You received this message because you are subscribed to the Google Groups 
> "InfluxData" group.
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> 
> To post to this group, send email to [email protected].
> 
> Visit this group at https://groups.google.com/group/influxdb.
> 
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/influxdb/7c735db5-5c40-43ab-946a-f0a98a231adf%40googlegroups.com.
> 
> For more options, visit https://groups.google.com/d/optout.

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/a55751cd-4ec2-4cd2-9e15-d0984e4b23d1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Full Compaction taking 30s and timing out writes and queries

Reply via email to