Providing a little extra info here:

The measurement has 1,057 unique series. It has two retention policies. One for 
1w (which is being used to store 1s interval data) and one for 16w (which is 
being used to store 1m interval data from a Continuous Query which runs agains 
the 1w RP every 1 minute). The default retention policy is not being used.

-Jeff

On Wednesday, December 21, 2016 at 1:09:49 PM UTC-6, Jeff wrote:
> Facing an interesting problem with my current InfluxDB single instance 
> deployment. I'm running on an 8 core machine with 8GB RAM (physical hardware) 
> with InfluxDB v1.1.1 running in a docker container.
> 
> I'm writing 520 points in batches of 1560 every 3 seconds to a Retention 
> Policy of "1w" with a "1d" shard group duration. Each point contains about 50 
> fields of data. Total in the measurement, there are 115 fields. So for any 
> given point, most of the fields are empty, but over a selection of all 
> series, all fields are used.
> 
> There's 1 tag in the measurement with about 520 series. I've got 1 
> ContinuousQuery configured to run every 3 minutes. The CQ is *massive*. It 
> looks something like this:
> "CREATE CONTINUOUS QUERY "\"3m\"" ON MyDB BEGIN SELECT mean(val1) AS val1, 
> mean(val2) AS val2, .... this continues for ALL 115 fields ... INTO 
> MyDB."16w".devices FROM MyDB."1w".devices GROUP BY time(3m), device END"
> 
> Surprisingly I don't think the CQ is causing too much of a performance issue 
> at the moment. Instead what I'm seeing in the influx logs is the following:
> 
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 beginning 
> level 3 compaction of group 0, 4 TSM files
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> compacting level 3 group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000773-000000003.tsm (#0)
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> compacting level 3 group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000777-000000003.tsm (#1)
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> compacting level 3 group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000781-000000003.tsm (#2)
> Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 
> compacting level 3 group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000785-000000003.tsm (#3)
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted 
> level 3 group (0) into 
> /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm.tm
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 compacted 
> level 3 4 files into 1 files in 6.339871251s
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 beginning 
> full compaction of group 0, 2 TSM files
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000769-000000005.tsm (#0)
> Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000785-000000004.tsm (#1)
> Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted 
> full group (0) into 
> /var/lib/influxdb/data/hostname/1w/1212/000000785-000000005.tsm.tmp (
> Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 compacted 
> full 2 files into 1 files in 23.549201117s
> 
> Not only do those compaction times seem very long (23.5 seconds?) but while 
> that full compaction is being performed, I'm getting "timeout" on writes. 
> That is, it starts taking longer than 10 seconds (default influx http write 
> timeout) for the write to be performed/acknowledged by influx. I've seen the 
> full compaction times hover around 30s consistently and seem to happen about 
> once every 30 minutes.
> 
> The influxDB instance seems to be using all available RAM on the machine. I 
> had to cap the docker container at 6GB memory usage in order to not starve 
> the rest of the system of resources.
> 
> Here's a copy of my logs noting very long write times in conjunction with a 
> full compaction occurring on the database:
> Process log (write duration is in ms):
> Dec 21 12:28:42 hostname process[11361]: 2016-12-21T12:28:42.615Z - warn: db 
> long write duration: 9824
> Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.106Z - warn: db 
> long write duration: 8242
> Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.214Z - warn: db 
> long write duration: 5260
> Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.314Z - warn: db 
> long write duration: 2273
> Dec 21 12:29:23 hostname process[11361]: 2016-12-21T12:29:23.667Z - warn: db 
> long write duration: 5044
> Dec 21 12:29:24 hostname process[11361]: 2016-12-21T12:29:24.710Z - warn: db 
> long write duration: 3036
> Dec 21 12:29:54 hostname process[11361]: 2016-12-21T12:29:54.533Z - warn: db 
> long write duration: 2393
> Dec 21 12:29:56 hostname process[11361]: 2016-12-21T12:29:56.793Z - warn: db 
> long write duration: 1588
> Dec 21 12:30:33 hostname process[11361]: 2016-12-21T12:30:33.274Z - warn: db 
> long write duration: 1513
> 
> Influx log:
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted 
> level 3 group (0) into 
> /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm.tm
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 compacted 
> level 3 8 files into 1 files in 13.399871009s
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 beginning 
> full compaction of group 0, 2 TSM files
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000513-000000005.tsm (#0)
> Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000529-000000004.tsm (#1)
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted 
> full group (0) into 
> /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm.tmp (
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 compacted 
> full 2 files into 1 files in 21.447891815s
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 beginning 
> full compaction of group 0, 2 TSM files
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000337-000000006.tsm (#0)
> Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 
> compacting full group (0) 
> /var/lib/influxdb/data/hostname/1w/1212/000000529-000000005.tsm (#1)
> Dec 21 12:29:26 hostname influxdb[3119]: [tsm1] 2016/12/21 12:29:26 Snapshot 
> for path /var/lib/influxdb/data/hostname/1w/1212 written in 788.281773ms
> Dec 21 12:30:04 hostname influxdb[3119]: [tsm1] 2016/12/21 12:30:04 Snapshot 
> for path /var/lib/influxdb/data/hostname/16w/1213 written in 985.274321ms
> 
> Is there anything I can do to help these compaction times be shorter? Would 
> having smaller shard groups (maybe 1h instead of 1d) help? Is the sheer 
> number of fields causing a problem? I could potentially break up the 
> measurement into multiple such that no one measurement has more than about 50 
> fields.
> 
> Thanks for any suggestions!

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/b7aa7578-54fd-4d0d-bf7f-04fce313b971%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to