I'm writing approximately 78,000 values per request (about 50 values per point with 1560 points every 3 seconds). I saw similar behavior when writing 26,000 values per request every 1 second.
Should I try breaking those up into smaller writes instead of larger ones? How can I adjust the max WAL cache size? I don't see that as an available configuration option in v1.1: https://docs.influxdata.com/influxdb/v1.1/administration/config#environment-variables Thanks! On Wednesday, December 21, 2016 at 11:20:46 AM UTC-8, Paul Dix wrote: > Compactions shouldn't cause write timeouts. I would suspect that write > timeouts are happening because you're posting too many values per request. > You can also try increasing the max WAL cache size. > > > How many actual values are you writing per request? That is field values. For > example: > > > cpu,host=serverA usage_user=23,usage_system=5 > > > Represents 2 values posted, not one. That might help narrow things down. > > > On Wed, Dec 21, 2016 at 1:09 PM, Jeff <[email protected]> wrote: > Facing an interesting problem with my current InfluxDB single instance > deployment. I'm running on an 8 core machine with 8GB RAM (physical hardware) > with InfluxDB v1.1.1 running in a docker container. > > > > I'm writing 520 points in batches of 1560 every 3 seconds to a Retention > Policy of "1w" with a "1d" shard group duration. Each point contains about 50 > fields of data. Total in the measurement, there are 115 fields. So for any > given point, most of the fields are empty, but over a selection of all > series, all fields are used. > > > > There's 1 tag in the measurement with about 520 series. I've got 1 > ContinuousQuery configured to run every 3 minutes. The CQ is *massive*. It > looks something like this: > > "CREATE CONTINUOUS QUERY "\"3m\"" ON MyDB BEGIN SELECT mean(val1) AS val1, > mean(val2) AS val2, .... this continues for ALL 115 fields ... INTO > MyDB."16w".devices FROM MyDB."1w".devices GROUP BY time(3m), device END" > > > > Surprisingly I don't think the CQ is causing too much of a performance issue > at the moment. Instead what I'm seeing in the influx logs is the following: > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 beginning > level 3 compaction of group 0, 4 TSM files > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > compacting level 3 group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000773-000000003.tsm (#0) > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > compacting level 3 group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000777-000000003.tsm (#1) > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > compacting level 3 group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000781-000000003.tsm (#2) > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > compacting level 3 group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000785-000000003.tsm (#3) > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted > level 3 group (0) into > /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm.tm > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted > level 3 4 files into 1 files in 6.339871251s > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 beginning > full compaction of group 0, 2 TSM files > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 > compacting full group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000769-000000005.tsm (#0) > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 > compacting full group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm (#1) > > Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted > full group (0) into > /var/lib/influxdb/data/hostname/1w/1212/000000785-000000005.tsm.tmp ( > > Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted > full 2 files into 1 files in 23.549201117s > > > > Not only do those compaction times seem very long (23.5 seconds?) but while > that full compaction is being performed, I'm getting "timeout" on writes. > That is, it starts taking longer than 10 seconds (default influx http write > timeout) for the write to be performed/acknowledged by influx. I've seen the > full compaction times hover around 30s consistently and seem to happen about > once every 30 minutes. > > > > The influxDB instance seems to be using all available RAM on the machine. I > had to cap the docker container at 6GB memory usage in order to not starve > the rest of the system of resources. > > > > Here's a copy of my logs noting very long write times in conjunction with a > full compaction occurring on the database: > > Process log (write duration is in ms): > > Dec 21 12:28:42 hostname process[11361]: 2016-12-21T12:28:42.615Z - warn: db > long write duration: 9824 > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.106Z - warn: db > long write duration: 8242 > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.214Z - warn: db > long write duration: 5260 > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.314Z - warn: db > long write duration: 2273 > > Dec 21 12:29:23 hostname process[11361]: 2016-12-21T12:29:23.667Z - warn: db > long write duration: 5044 > > Dec 21 12:29:24 hostname process[11361]: 2016-12-21T12:29:24.710Z - warn: db > long write duration: 3036 > > Dec 21 12:29:54 hostname process[11361]: 2016-12-21T12:29:54.533Z - warn: db > long write duration: 2393 > > Dec 21 12:29:56 hostname process[11361]: 2016-12-21T12:29:56.793Z - warn: db > long write duration: 1588 > > Dec 21 12:30:33 hostname process[11361]: 2016-12-21T12:30:33.274Z - warn: db > long write duration: 1513 > > > > Influx log: > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted > level 3 group (0) into > /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm.tm > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted > level 3 8 files into 1 files in 13.399871009s > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 beginning > full compaction of group 0, 2 TSM files > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 > compacting full group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000513-000000005.tsm (#0) > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 > compacting full group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm (#1) > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted > full group (0) into > /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm.tmp ( > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted > full 2 files into 1 files in 21.447891815s > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 beginning > full compaction of group 0, 2 TSM files > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 > compacting full group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000337-000000006.tsm (#0) > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 > compacting full group (0) > /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm (#1) > > Dec 21 12:29:26 hostname influxdb[3119]: [tsm1] 2016/12/21 12:29:26 Snapshot > for path /var/lib/influxdb/data/hostname/1w/1212 written in 788.281773ms > > Dec 21 12:30:04 hostname influxdb[3119]: [tsm1] 2016/12/21 12:30:04 Snapshot > for path /var/lib/influxdb/data/hostname/16w/1213 written in 985.274321ms > > > > Is there anything I can do to help these compaction times be shorter? Would > having smaller shard groups (maybe 1h instead of 1d) help? Is the sheer > number of fields causing a problem? I could potentially break up the > measurement into multiple such that no one measurement has more than about 50 > fields. > > > > Thanks for any suggestions! > > > > -- > > Remember to include the version number! > > --- > > You received this message because you are subscribed to the Google Groups > "InfluxData" group. > > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > > To post to this group, send email to [email protected]. > > Visit this group at https://groups.google.com/group/influxdb. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/influxdb/7c735db5-5c40-43ab-946a-f0a98a231adf%40googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/a55751cd-4ec2-4cd2-9e15-d0984e4b23d1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
