Tanya, what range of time does your data cover? What are the retention policies on the database?
On Tue, Oct 11, 2016 at 11:14 PM, Tanya Unterberger < [email protected]> wrote: > Hi Sean, > > 1. Initially I killed the process > 2. At some point I restarted influxdb service > 3. Error logs show no errors > 4. I rebuilt the server, installed the latest rpm. Reimported the data via > scripts. Data goes in, but the server is unusable. Looks like indexing > might be stuffed. The size of the data in that database is 38M. Total size > of /var/lib/influxdb/data/ 273M > 5. CPU went beserk and doesn't come down > 6. A query like select count(blah) to the measurement that was batch > inserted (10k records at a time) is unusable and times out > 7. I need to import around 15 million records. How should I throttle that? > > At the moment I am pulling my hair out (not a pretty sight) > > Thanks a lot! > Tanya > > On 12 October 2016 at 06:11, Sean Beckett <[email protected]> wrote: > >> >> >> On Tue, Oct 11, 2016 at 12:11 AM, <[email protected]> wrote: >> >>> Hi, >>> >>> It seems that the old issue might have surfaced again (#3349) in v1.0. >>> >>> I tried to insert a large number of records (3913595) via a script, >>> inserting 10000 rows at a time. >>> >>> After a while I received >>> >>> HTTP/1.1 500 Internal Server Error >>> Content-Type: application/json >>> Request-Id: ac8ebbbe-8f70-11e6-8ce7-000000000000 >>> X-Influxdb-Version: 1.0.0 >>> Date: Tue, 11 Oct 2016 05:12:02 GMT >>> Content-Length: 20 >>> >>> {"error":"timeout"} >>> HTTP/1.1 100 Continue >>> >>> I killed the process, after which the whole box became pretty much >>> unresponsive. >>> >> >> Killed the InfluxDB process, or the batch writing script process? >> >> >>> >>> There is nothing in the logs (i.e. sudo ls /var/log/influxdb/ gives me >>> nothing) although the setting for http logging is true: >>> >> >> systemd OSes put the logs in a new place (yay!?). See >> http://docs.influxdata.com/influxdb/v1.0/administration/logs/#systemd >> for how to read the logs. >> >> >>> >>> [http] >>> enabled = true >>> bind-address = ":8086" >>> auth-enabled = true >>> log-enabled = true >>> >>> I tried to restart influx, but got the following error: >>> >>> Failed to connect to http://localhost:8086 >>> Please check your connection settings and ensure 'influxd' is running. >>> >> >> The `influx` console is just a fancy wrapper on the API. That error >> doesn't mean much except that the HTTP listener in InfluxDB is not yet up >> and running. >> >> >>> >>> Although I can see that influxd is up an running: >>> >>> > systemctl | grep influx >>> influxdb.service >>> loaded active running InfluxDB is an open-source, >>> distributed, time series database >>> >>> What do I do now? >>> >> >> Check the logs as referenced above. >> >> The non-responsiveness on startup isn't surprising. It sounds like the >> system was overwhelmed with writes, which means that the WAL would have >> many points cached, waiting to be flushed to disk. On restart, InfluxDB >> won't accept new writes or queries until the cached ones in the WAL have >> persisted. For this reason, the HTTP listener is off until the WAL is >> flushed. >> >> >>> >>> I tried the same import over the weekend, then the script timeout >>> happened eventually but the result was the same unresponsive, unusable >>> server. We rebuilt the box and started again. >>> >> >> It sounds like the box is just overwhelmed. Did you get backoff messages >> from the writes before the crash? What are the machine specs? >> >> >> >>> >>> Perhaps it is worthwhile mentioning that the same measurement already >>> contained about 9 million records. Some of these records had the same >>> timestamp as the ones I tried to import, i.e. they should have been merged. >>> >> >> Overwriting points is much much more expensive than posting new points. >> Each overwritten point triggers a tombstone record which must later be >> processed. This can trigger frequent compactions of the TSM files. With a >> high write load and frequent compactions, the system would encounter >> significant CPU pressure. >> >> >>> >>> Interestingly enough the same amount of data was fine when I forgot to >>> add precision in ms, i.e. all records were imported as nanoseconds, but in >>> fact they "lacked" 6 zeroes. >>> >> >> That would mean all points are going to the same shard. It is more >> resource intensive to load points across a wide range of time, since more >> shard files are involved. InfluxDB does best with sequential >> chronologically ordered unique points from the very recent past. The more >> the write operation differs from that, the lower the throughput. >> >> >>> >>> Please advise what kind of action I can take. >>> >> >> Look in the logs for errors. Throttle the writes. Don't overwrite more >> points than you have to. >> >> >>> >>> Thanks a lot! >>> Tanya >>> >>> -- >>> Remember to include the InfluxDB version number with all issue reports >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "InfluxDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/influxdb. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/influxdb/f4ebdb56-32f9-4fb6-88de-f7ef603c4262%40googlegroups.com. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Sean Beckett >> Director of Support and Professional Services >> InfluxDB >> >> -- >> Remember to include the version number! >> --- >> You received this message because you are subscribed to a topic in the >> Google Groups "InfluxData" group. >> To unsubscribe from this topic, visit https://groups.google.com/d/to >> pic/influxdb/sZIR8wY_v4g/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/influxdb. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/influxdb/CALGqCvMCu%3DM9eR5NOky-LRAiqRU5cnCDJa0SBjRrz5_W >> t0tT8g%40mail.gmail.com >> <https://groups.google.com/d/msgid/influxdb/CALGqCvMCu%3DM9eR5NOky-LRAiqRU5cnCDJa0SBjRrz5_Wt0tT8g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > Remember to include the version number! > --- > You received this message because you are subscribed to the Google Groups > "InfluxData" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/influxdb/CAAHSRnBqEwgJ3npvUHGcVc-J4dFWs5bwvYAU3P84xKsh1C0yBA% > 40mail.gmail.com > <https://groups.google.com/d/msgid/influxdb/CAAHSRnBqEwgJ3npvUHGcVc-J4dFWs5bwvYAU3P84xKsh1C0yBA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Sean Beckett Director of Support and Professional Services InfluxDB -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CALGqCvNrjBpLPShu1_8zRvJ5d-zmO0%2BH%2ByTFJEVDDGkdM_J6Lw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
