Re: [influxdb] Internal server error, timeout and unusable server after large imports

Sean Beckett Wed, 12 Oct 2016 12:28:04 -0700

Tanya, what range of time does your data cover? What are the retention
policies on the database?


On Tue, Oct 11, 2016 at 11:14 PM, Tanya Unterberger <
[email protected]> wrote:

> Hi Sean,
>
> 1. Initially I killed the process
> 2. At some point I restarted influxdb service
> 3. Error logs show no errors
> 4. I rebuilt the server, installed the latest rpm. Reimported the data via
> scripts. Data goes in, but the server is unusable. Looks like indexing
> might be stuffed. The size of the data in that database is 38M. Total size
> of /var/lib/influxdb/data/ 273M
> 5. CPU went beserk and doesn't come down
> 6. A query like select count(blah) to the measurement that was batch
> inserted (10k records at a time) is unusable and times out
> 7. I need to import around 15 million records. How should I throttle that?
>
> At the moment I am pulling my hair out (not a pretty sight)
>
> Thanks a lot!
> Tanya
>
> On 12 October 2016 at 06:11, Sean Beckett <[email protected]> wrote:
>
>>
>>
>> On Tue, Oct 11, 2016 at 12:11 AM, <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> It seems that the old issue might have surfaced again (#3349) in v1.0.
>>>
>>> I tried to insert a large number of records (3913595) via a script,
>>> inserting 10000 rows at a time.
>>>
>>> After a while I received
>>>
>>> HTTP/1.1 500 Internal Server Error
>>> Content-Type: application/json
>>> Request-Id: ac8ebbbe-8f70-11e6-8ce7-000000000000
>>> X-Influxdb-Version: 1.0.0
>>> Date: Tue, 11 Oct 2016 05:12:02 GMT
>>> Content-Length: 20
>>>
>>> {"error":"timeout"}
>>> HTTP/1.1 100 Continue
>>>
>>> I killed the process, after which the whole box became pretty much
>>> unresponsive.
>>>
>>
>> Killed the InfluxDB process, or the batch writing script process?
>>
>>
>>>
>>> There is nothing in the logs (i.e. sudo ls /var/log/influxdb/ gives me
>>> nothing) although the setting for http logging is true:
>>>
>>
>> systemd OSes put the logs in a new place (yay!?). See
>> http://docs.influxdata.com/influxdb/v1.0/administration/logs/#systemd
>> for how to read the logs.
>>
>>
>>>
>>> [http]
>>>   enabled = true
>>>   bind-address = ":8086"
>>>   auth-enabled = true
>>>   log-enabled = true
>>>
>>> I tried to restart influx, but got the following error:
>>>
>>> Failed to connect to http://localhost:8086
>>> Please check your connection settings and ensure 'influxd' is running.
>>>
>>
>> The `influx` console is just a fancy wrapper on the API. That error
>> doesn't mean much except that the HTTP listener in InfluxDB is not yet up
>> and running.
>>
>>
>>>
>>> Although I can see that influxd is up an running:
>>>
>>> > systemctl | grep influx
>>> influxdb.service
>>>                   loaded active running   InfluxDB is an open-source,
>>> distributed, time series database
>>>
>>> What do I do now?
>>>
>>
>> Check the logs as referenced above.
>>
>> The non-responsiveness on startup isn't surprising. It sounds like the
>> system was overwhelmed with writes, which means that the WAL would have
>> many points cached, waiting to be flushed to disk. On restart, InfluxDB
>> won't accept new writes or queries until the cached ones in the WAL have
>> persisted. For this reason, the HTTP listener is off until the WAL is
>> flushed.
>>
>>
>>>
>>> I tried the same import over the weekend, then the script timeout
>>> happened eventually but the result was the same unresponsive, unusable
>>> server. We rebuilt the box and started again.
>>>
>>
>> It sounds like the box is just overwhelmed. Did you get backoff messages
>> from the writes before the crash? What are the machine specs?
>>
>>
>>
>>>
>>> Perhaps it is worthwhile mentioning that the same measurement already
>>> contained about 9 million records. Some of these records had the same
>>> timestamp as the ones I tried to import, i.e. they should have been merged.
>>>
>>
>> Overwriting points is much much more expensive than posting new points.
>> Each overwritten point triggers a tombstone record which must later be
>> processed. This can trigger frequent compactions of the TSM files. With a
>> high write load and frequent compactions, the system would encounter
>> significant CPU pressure.
>>
>>
>>>
>>> Interestingly enough the same amount of data was fine when I forgot to
>>> add precision in ms, i.e. all records were imported as nanoseconds, but in
>>> fact they "lacked" 6 zeroes.
>>>
>>
>> That would mean all points are going to the same shard. It is more
>> resource intensive to load points across a wide range of time, since more
>> shard files are involved. InfluxDB does best with sequential
>> chronologically ordered unique points from the very recent past. The more
>> the write operation differs from that, the lower the throughput.
>>
>>
>>>
>>> Please advise what kind of action I can take.
>>>
>>
>> Look in the logs for errors. Throttle the writes. Don't overwrite more
>> points than you have to.
>>
>>
>>>
>>> Thanks a lot!
>>> Tanya
>>>
>>> --
>>> Remember to include the InfluxDB version number with all issue reports
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "InfluxDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/influxdb.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/influxdb/f4ebdb56-32f9-4fb6-88de-f7ef603c4262%40googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Sean Beckett
>> Director of Support and Professional Services
>> InfluxDB
>>
>> --
>> Remember to include the version number!
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "InfluxData" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>> pic/influxdb/sZIR8wY_v4g/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/influxdb.
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/influxdb/CALGqCvMCu%3DM9eR5NOky-LRAiqRU5cnCDJa0SBjRrz5_W
>> t0tT8g%40mail.gmail.com
>> <https://groups.google.com/d/msgid/influxdb/CALGqCvMCu%3DM9eR5NOky-LRAiqRU5cnCDJa0SBjRrz5_Wt0tT8g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> Remember to include the version number!
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/influxdb/CAAHSRnBqEwgJ3npvUHGcVc-J4dFWs5bwvYAU3P84xKsh1C0yBA%
> 40mail.gmail.com
> <https://groups.google.com/d/msgid/influxdb/CAAHSRnBqEwgJ3npvUHGcVc-J4dFWs5bwvYAU3P84xKsh1C0yBA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Sean Beckett
Director of Support and Professional Services
InfluxDB

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CALGqCvNrjBpLPShu1_8zRvJ5d-zmO0%2BH%2ByTFJEVDDGkdM_J6Lw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Internal server error, timeout and unusable server after large imports

Reply via email to