On Thursday, August 11, 2016 at 5:44:37 PM UTC-6, [email protected] wrote: > > tl;dr While stress testing InfluxEnterprise, we saw unexpected memory > behaviour that we would like to explain. > > Yesterday, I spent some time stressing our self-hosted 2-node > InfluxEnterprise cluster with a high amount of writes to a test database. > This cluster's data nodes are two AWS m4.large instances, which have 2 > VCPUs and 8GB of memory each. The servers run Debian 8 (Linux > 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1 (2015-05-24) x86_64 GNU/Linux). > The servers are used for nothing else. > > The database was specifically created for running the test. The > measurements and series stored in that database are defined here in the > test script: > https://gist.github.com/goakley/2bc6503f3bcfbb25270429fa2b3c9e4b Note > the low series cardinality for the database. This database lives in the > same cluster as a `telegraf` database which has a cardinality of 13k. > > To stress the system, we ran the test script at 10k writes/s on three > different machines. Here is a graph showing the memory and CPU utilization > during testing: http://i.imgur.com/F0GeDT8.png > > Refer to the times on the above image. The test started at 1600, with the > approximate 30k/s write throughput. As expected, CPU utilization was maxed > out in an attempt to keep up with the write throughput. At around 1630, our > scripts started to time out while writing to InfluxDB (with a 4 second wait > time), however data was still flowing to the system. The scripts remained > in this state for the rest of the test. At 16:45, we saw the unexpected > spike in memory usage until one of the machine's InfluxDB process was > killed, automatically restarted, and recovered all unwritten data. Shortly > after we stopped the scripts, and the cluster recovered. > > Here is the log from the server that maxed out its memory during the time > at which that event occurred: > https://drive.google.com/file/d/0B5o3UEMmVkdIQVgxM0VuTlR0Z3M/view?usp=sharing > > The question here is why InfluxDB suddenly started consuming all the > system memory. It was at a time after which the test had been running for > a while. > > I see some possible reasons suggested in this InfluxDB feature request: > https://github.com/influxdata/influxdb/issues/7142 I am wondering if > anyone else has seem a similar pattern and can explain the behaviour. >
Hi, If you can reproduce this reliably, can you grab some profile data from each node when memory is high using the following: curl -o block.txt "http://localhost:8086/debug/pprof/block?debug=1" curl -o goroutine.txt "http://localhost:8086/debug/pprof/goroutine?debug=1" curl -o heap.txt "http://localhost:8086/debug/pprof/heap?debug=1" curl -o vars.txt "http://localhost:8086/debug/vars" iostat -xd 1 30 > iostat.txt influx -execute "show shards" > shards.txt influx -execute "show stats" > stats.txt influx -execute "show diagnostics" > diagnostics.txt Also, your client doesn't show what is being sent as tags or values. Could you clarify what values are tags vs fields? Was the process killed by the the linux OOM killer? I don't see a trace in the log file which makes me think the OS killed the process rather than an allocation triggering an OOM. There have been issues logged for high memory usage due to clients not closing connections. I can't see your client code, but have you verified that connections are being closed properly or that connections are being re-used? Are you using the influxdb Go client or something custom behind the scenes? -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/6e142a93-5ffd-4843-9f98-a0d797953114%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
