80k values per second should be no problem. We regularly test at > 800k values/sec and what's on master now will do ~2M values/sec if you're on a large enough box.
You should be posting 1k-2k values per post, but have multiple threads or processes doing it. Concurrency is the key. The total number of values/sec shouldn't be a problem on your hardware (assuming you're doing < 100k values/sec). On Sat, Dec 24, 2016 at 10:00 AM, <[email protected]> wrote: > Is there any kind of guide on how I should be sizing these numbers? > > I tried doubling the cache-memory-max-size and quadrupling the > cache-snapshot-memory-size. I also tried writing fewer values per request, > but that didn't really seem to help. > > I tried switching to writing only 1k, 5k, 8k, 10k, 20k, and 80k values per > request per second. I had my process log out a warning any time requests > took longer than one second to complete since I'm generating data at a rate > of once per second. This resulted in anywhere from 1 to 50 posts per second > (I had overestimated the number of values per second I was writing - it's > actually somewhere around 50k). It seemed like somewhere around 30k values > per request actually worked best. I'd very frequently see multiple requests > take longer than 1s when sending 50 posts with 1k values per second. With > 30k values per post, the individual posts would take about 1-3 seconds to > complete but seemed to "catch up" every now and then. That is, it could go > for up to 10 seconds without seeing a request take longer than 1s to > complete whereas with 50 posts per second, I'd always see at least a > handful of posts take longer than 1s in every group. > > I did still see timeout errors with this configuration: > Dec 24 10:12:37 process [27383]: 2016-12-24T10:12:37.662Z - error: > {"error":"timeout"} > Dec 24 10:12:37 process [27383]: 2016-12-24T10:12:37.671Z - warn: db long > write duration: 10037 > ... (there were about 7 of these in a row, all taking longer than 10s to > complete and timing out) > > And this is the compaction log from influx around the same time: > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 compacted > full group (0) into /var/lib/influxdb/data/host/ > 1w/1224/000000689-000000005.tsm.tmp (#0) > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 compacted > full 4 files into 1 files in 2m38.182651921s > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 beginning > full compaction of group 0, 2 TSM files > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 > compacting full group (0) /var/lib/influxdb/data/host/ > 1w/1224/000000496-000000006.tsm (#0) > Dec 24 10:10:10 host influxdb[26479]: [tsm1] 2016/12/24 10:10:10 > compacting full group (0) /var/lib/influxdb/data/host/ > 1w/1224/000000689-000000005.tsm (#1) > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 beginning > level 1 compaction of group 0, 6 TSM files > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 > compacting level 1 group (0) /var/lib/influxdb/data/host/ > 1w/1224/000000690-000000001.tsm (#0) > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 > compacting level 1 group (0) /var/lib/influxdb/data/host/ > 1w/1224/000000690-000000001.tsm (#1) > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 > compacting level 1 group (0) /var/lib/influxdb/data/host/ > 1w/1224/000000690-000000001.tsm (#2) > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 > compacting level 1 group (0) /var/lib/influxdb/data/host/ > 1w/1224/000000691-000000001.tsm (#3) > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 > compacting level 1 group (0) /var/lib/influxdb/data/host/ > 1w/1224/000000691-000000001.tsm (#4) > Dec 24 10:12:15 host influxdb[26479]: [tsm1] 2016/12/24 10:12:15 > compacting level 1 group (0) /var/lib/influxdb/data/host/ > 1w/1224/000000691-000000001.tsm (#5) > Dec 24 10:12:16 host influxdb[26479]: [tsm1] 2016/12/24 10:12:16 Snapshot > for path /var/lib/influxdb/data/host/1w/1224 written in 4.217841051s > Dec 24 10:12:26 host influxdb[26479]: [tsm1] 2016/12/24 10:12:26 compacted > level 1 group (0) into /var/lib/influxdb/data/host/ > 1w/1224/000000691-000000002.tsm.tmp (#0) > Dec 24 10:12:26 host influxdb[26479]: [tsm1] 2016/12/24 10:12:26 compacted > level 1 6 files into 1 files in 10.458949133s > Dec 24 10:14:22 host influxdb[26479]: [tsm1] 2016/12/24 10:14:22 compacted > full group (0) into /var/lib/influxdb/data/host/ > 1w/1224/000000689-000000006.tsm.tmp (#0) > Dec 24 10:14:22 host influxdb[26479]: [tsm1] 2016/12/24 10:14:22 compacted > full 2 files into 1 files in 4m11.403244468s > > Compaction times have definitely gone up there at 4 minutes and 11 seconds. > > Do you have any further suggestions of how I can "tune" influx for > handling this large and fast volume of writes? I could send posts less > frequently, but it's still the same amount of data. So if I did posts every > 3 seconds, I would have to send 3x the number of requests every 3 seconds. > > Most queries that run against the DB are for realtime charts (similar to > grafana) which are displaying a 5 or 10 minute window of 1s data for a > small number of values and tags. These queries seem to be pretty performant > (only taking about 70ms for a batch of 5 queries). > > I'm still not seeing any bottlenecks in terms of memory or CPU (as in I > never see either of them really spike or max out). The harddrive is a > modern SSD and we recently increased the RAM to 16GB. I'm not sure what's > causing the long write times, or if it's just a combination of queries, the > continuous query, and compaction that's giving it a hard time. > > Thanks again for the help so far! > > On Thursday, December 22, 2016 at 9:45:48 AM UTC-6, Paul Dix wrote: > > You might try breaking it up further. We generally do performance tests > with 1k-10k values per request. You can set the WAL snapshotting sizes here: > > https://github.com/influxdata/influxdb/blob/master/etc/ > config.sample.toml#L62-L68 > > > > > > > > On Wed, Dec 21, 2016 at 2:40 PM, <[email protected]> wrote: > > I'm writing approximately 78,000 values per request (about 50 values per > point with 1560 points every 3 seconds). I saw similar behavior when > writing 26,000 values per request every 1 second. > > > > > > > > Should I try breaking those up into smaller writes instead of larger > ones? > > > > > > > > How can I adjust the max WAL cache size? I don't see that as an > available configuration option in v1.1: > > > > https://docs.influxdata.com/influxdb/v1.1/administration/ > config#environment-variables > > > > > > > > Thanks! > > > > > > > > On Wednesday, December 21, 2016 at 11:20:46 AM UTC-8, Paul Dix wrote: > > > > > Compactions shouldn't cause write timeouts. I would suspect that write > timeouts are happening because you're posting too many values per request. > You can also try increasing the max WAL cache size. > > > > > > > > > > > > > > > How many actual values are you writing per request? That is field > values. For example: > > > > > > > > > > > > > > > cpu,host=serverA usage_user=23,usage_system=5 > > > > > > > > > > > > > > > Represents 2 values posted, not one. That might help narrow things > down. > > > > > > > > > > > > > > > > > > > On Wed, Dec 21, 2016 at 1:09 PM, Jeff <[email protected]> > wrote: > > > > > Facing an interesting problem with my current InfluxDB single instance > deployment. I'm running on an 8 core machine with 8GB RAM (physical > hardware) with InfluxDB v1.1.1 running in a docker container. > > > > > > > > > > > > > > > > > > > > I'm writing 520 points in batches of 1560 every 3 seconds to a > Retention Policy of "1w" with a "1d" shard group duration. Each point > contains about 50 fields of data. Total in the measurement, there are 115 > fields. So for any given point, most of the fields are empty, but over a > selection of all series, all fields are used. > > > > > > > > > > > > > > > > > > > > There's 1 tag in the measurement with about 520 series. I've got 1 > ContinuousQuery configured to run every 3 minutes. The CQ is *massive*. It > looks something like this: > > > > > > > > > > "CREATE CONTINUOUS QUERY "\"3m\"" ON MyDB BEGIN SELECT mean(val1) AS > val1, mean(val2) AS val2, .... this continues for ALL 115 fields ... INTO > MyDB."16w".devices FROM MyDB."1w".devices GROUP BY time(3m), device END" > > > > > > > > > > > > > > > > > > > > Surprisingly I don't think the CQ is causing too much of a performance > issue at the moment. Instead what I'm seeing in the influx logs is the > following: > > > > > > > > > > > > > > > > > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > beginning level 3 compaction of group 0, 4 TSM files > > > > > > > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > compacting level 3 group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000773-000000003.tsm (#0) > > > > > > > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > compacting level 3 group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000777-000000003.tsm (#1) > > > > > > > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > compacting level 3 group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000781-000000003.tsm (#2) > > > > > > > > > > Dec 21 18:08:30 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:30 > compacting level 3 group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000785-000000003.tsm (#3) > > > > > > > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 > compacted level 3 group (0) into /var/lib/influxdb/data/hostname/1w/1212/ > 000000785-000000004.tsm.tm > > > > > > > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 > compacted level 3 4 files into 1 files in 6.339871251s > > > > > > > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 > beginning full compaction of group 0, 2 TSM files > > > > > > > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 > compacting full group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000769-000000005.tsm (#0) > > > > > > > > > > Dec 21 18:08:37 hostname influxdb[3119]: [tsm1] 2016/12/21 18:08:37 > compacting full group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000785-000000004.tsm (#1) > > > > > > > > > > Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 > compacted full group (0) into /var/lib/influxdb/data/ > hostname/1w/1212/000000785-000000005.tsm.tmp ( > > > > > > > > > > Dec 21 18:09:00 hostname influxdb[3119]: [tsm1] 2016/12/21 18:09:00 > compacted full 2 files into 1 files in 23.549201117s > > > > > > > > > > > > > > > > > > > > Not only do those compaction times seem very long (23.5 seconds?) but > while that full compaction is being performed, I'm getting "timeout" on > writes. That is, it starts taking longer than 10 seconds (default influx > http write timeout) for the write to be performed/acknowledged by influx. > I've seen the full compaction times hover around 30s consistently and seem > to happen about once every 30 minutes. > > > > > > > > > > > > > > > > > > > > The influxDB instance seems to be using all available RAM on the > machine. I had to cap the docker container at 6GB memory usage in order to > not starve the rest of the system of resources. > > > > > > > > > > > > > > > > > > > > Here's a copy of my logs noting very long write times in conjunction > with a full compaction occurring on the database: > > > > > > > > > > Process log (write duration is in ms): > > > > > > > > > > Dec 21 12:28:42 hostname process[11361]: 2016-12-21T12:28:42.615Z - > warn: db long write duration: 9824 > > > > > > > > > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.106Z - > warn: db long write duration: 8242 > > > > > > > > > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.214Z - > warn: db long write duration: 5260 > > > > > > > > > > Dec 21 12:28:44 hostname process[11361]: 2016-12-21T12:28:44.314Z - > warn: db long write duration: 2273 > > > > > > > > > > Dec 21 12:29:23 hostname process[11361]: 2016-12-21T12:29:23.667Z - > warn: db long write duration: 5044 > > > > > > > > > > Dec 21 12:29:24 hostname process[11361]: 2016-12-21T12:29:24.710Z - > warn: db long write duration: 3036 > > > > > > > > > > Dec 21 12:29:54 hostname process[11361]: 2016-12-21T12:29:54.533Z - > warn: db long write duration: 2393 > > > > > > > > > > Dec 21 12:29:56 hostname process[11361]: 2016-12-21T12:29:56.793Z - > warn: db long write duration: 1588 > > > > > > > > > > Dec 21 12:30:33 hostname process[11361]: 2016-12-21T12:30:33.274Z - > warn: db long write duration: 1513 > > > > > > > > > > > > > > > > > > > > Influx log: > > > > > > > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 > compacted level 3 group (0) into /var/lib/influxdb/data/hostname/1w/1212/ > 000000529-000000004.tsm.tm > > > > > > > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 > compacted level 3 8 files into 1 files in 13.399871009s > > > > > > > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 > beginning full compaction of group 0, 2 TSM files > > > > > > > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 > compacting full group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000513-000000005.tsm (#0) > > > > > > > > > > Dec 21 12:28:22 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:22 > compacting full group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000529-000000004.tsm (#1) > > > > > > > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 > compacted full group (0) into /var/lib/influxdb/data/ > hostname/1w/1212/000000529-000000005.tsm.tmp ( > > > > > > > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 > compacted full 2 files into 1 files in 21.447891815s > > > > > > > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 > beginning full compaction of group 0, 2 TSM files > > > > > > > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 > compacting full group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000337-000000006.tsm (#0) > > > > > > > > > > Dec 21 12:28:44 hostname influxdb[3119]: [tsm1] 2016/12/21 12:28:44 > compacting full group (0) /var/lib/influxdb/data/ > hostname/1w/1212/000000529-000000005.tsm (#1) > > > > > > > > > > Dec 21 12:29:26 hostname influxdb[3119]: [tsm1] 2016/12/21 12:29:26 > Snapshot for path /var/lib/influxdb/data/hostname/1w/1212 written in > 788.281773ms > > > > > > > > > > Dec 21 12:30:04 hostname influxdb[3119]: [tsm1] 2016/12/21 12:30:04 > Snapshot for path /var/lib/influxdb/data/hostname/16w/1213 written in > 985.274321ms > > > > > > > > > > > > > > > > > > > > Is there anything I can do to help these compaction times be shorter? > Would having smaller shard groups (maybe 1h instead of 1d) help? Is the > sheer number of fields causing a problem? I could potentially break up the > measurement into multiple such that no one measurement has more than about > 50 fields. > > > > > > > > > > > > > > > > > > > > Thanks for any suggestions! > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Remember to include the version number! > > > > > > > > > > --- > > > > > > > > > > You received this message because you are subscribed to the Google > Groups "InfluxData" group. > > > > > > > > > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > > > > > > > > > To post to this group, send email to [email protected]. > > > > > > > > > > Visit this group at https://groups.google.com/group/influxdb. > > > > > > > > > > To view this discussion on the web visit https://groups.google.com/d/ > msgid/influxdb/7c735db5-5c40-43ab-946a-f0a98a231adf%40googlegroups.com. > > > > > > > > > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > -- > > > > Remember to include the version number! > > > > --- > > > > You received this message because you are subscribed to the Google > Groups "InfluxData" group. > > > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > > > To post to this group, send email to [email protected]. > > > > Visit this group at https://groups.google.com/group/influxdb. > > > > To view this discussion on the web visit https://groups.google.com/d/ > msgid/influxdb/a55751cd-4ec2-4cd2-9e15-d0984e4b23d1%40googlegroups.com. > > > > > > > > For more options, visit https://groups.google.com/d/optout. > > -- > Remember to include the version number! > --- > You received this message because you are subscribed to the Google Groups > "InfluxData" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/influxdb/3559f0e0-3c8b-44c5-9750-90460e69473e%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CAHRno_yVEKJ89A_FDz0URU94oTjQQwSmh0C0%3D1zVhCC97Z%2B5iw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
