Re: [influxdb] Internal server error, timeout and unusable server after large imports

Sean Beckett Wed, 12 Oct 2016 22:25:51 -0700

Tonya, when you write the data in ms but don't specify the precision, the
database interprets those millisecond timestamps as nanoseconds, and all
the data is written to a single shard covering Jan 1, 1970.



> insert msns value=42 1476336190000
> select * from msns
name: msns
----------
time value
1476336190000 42

> precision rfc3339
> select * from msns
name: msns
----------
time value
1970-01-01T00:24:36.33619Z 42

That's why everything is fast, because all the data is in one shard.

On Wed, Oct 12, 2016 at 9:50 PM, Tanya Unterberger <
[email protected]> wrote:

> Hi Sean,
>
> I can reproduce all the CPU issues, slowness, etc. if I try to import the
> data that I have in milliseconds, specifying precision as milliseconds.
>
> If I insert the same data without specifying any precision and query
> without specifying any precision, the database is lightingly fast. The same
> data.
>
> The reason I was adding precision=ms is that I thought it was the right
> thing to do. The manual advises that Influx sores the data in nanoseconds
> but to use the lowest precision to insert. So at some stage I even used
> hours, but inserted the data with precision=h. When Influx tried to convert
> that data to nanoseconds, index, etc, then it was having a hissy fit.
>
> Is it a bug or the manual should state that if you query the data at the
> same precision as you insert, then you can go with the lowest precision and
> do not specify what precision you are inserting?
>
> Thanks,
> Tanya
>
> On 13 October 2016 at 10:26, Tanya Unterberger <
> [email protected]> wrote:
>
>> Hi Sean,
>>
>> The data is from 1838 to 2016, daily (sparse at times). We need to retain
>> it, therefore the default policy.
>>
>> Thanks,
>> Tanya
>>
>> On 13 October 2016 at 06:26, Sean Beckett <[email protected]> wrote:
>>
>>> Tanya, what range of time does your data cover? What are the retention
>>> policies on the database?
>>>
>>> On Tue, Oct 11, 2016 at 11:14 PM, Tanya Unterberger <
>>> [email protected]> wrote:
>>>
>>>> Hi Sean,
>>>>
>>>> 1. Initially I killed the process
>>>> 2. At some point I restarted influxdb service
>>>> 3. Error logs show no errors
>>>> 4. I rebuilt the server, installed the latest rpm. Reimported the data
>>>> via scripts. Data goes in, but the server is unusable. Looks like indexing
>>>> might be stuffed. The size of the data in that database is 38M. Total size
>>>> of /var/lib/influxdb/data/ 273M
>>>> 5. CPU went beserk and doesn't come down
>>>> 6. A query like select count(blah) to the measurement that was batch
>>>> inserted (10k records at a time) is unusable and times out
>>>> 7. I need to import around 15 million records. How should I throttle
>>>> that?
>>>>
>>>> At the moment I am pulling my hair out (not a pretty sight)
>>>>
>>>> Thanks a lot!
>>>> Tanya
>>>>
>>>> On 12 October 2016 at 06:11, Sean Beckett <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 11, 2016 at 12:11 AM, <[email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It seems that the old issue might have surfaced again (#3349) in v1.0.
>>>>>>
>>>>>> I tried to insert a large number of records (3913595) via a script,
>>>>>> inserting 10000 rows at a time.
>>>>>>
>>>>>> After a while I received
>>>>>>
>>>>>> HTTP/1.1 500 Internal Server Error
>>>>>> Content-Type: application/json
>>>>>> Request-Id: ac8ebbbe-8f70-11e6-8ce7-000000000000
>>>>>> X-Influxdb-Version: 1.0.0
>>>>>> Date: Tue, 11 Oct 2016 05:12:02 GMT
>>>>>> Content-Length: 20
>>>>>>
>>>>>> {"error":"timeout"}
>>>>>> HTTP/1.1 100 Continue
>>>>>>
>>>>>> I killed the process, after which the whole box became pretty much
>>>>>> unresponsive.
>>>>>>
>>>>>
>>>>> Killed the InfluxDB process, or the batch writing script process?
>>>>>
>>>>>
>>>>>>
>>>>>> There is nothing in the logs (i.e. sudo ls /var/log/influxdb/ gives
>>>>>> me nothing) although the setting for http logging is true:
>>>>>>
>>>>>
>>>>> systemd OSes put the logs in a new place (yay!?). See
>>>>> http://docs.influxdata.com/influxdb/v1.0/administration/logs/#systemd
>>>>> for how to read the logs.
>>>>>
>>>>>
>>>>>>
>>>>>> [http]
>>>>>>   enabled = true
>>>>>>   bind-address = ":8086"
>>>>>>   auth-enabled = true
>>>>>>   log-enabled = true
>>>>>>
>>>>>> I tried to restart influx, but got the following error:
>>>>>>
>>>>>> Failed to connect to http://localhost:8086
>>>>>> Please check your connection settings and ensure 'influxd' is running.
>>>>>>
>>>>>
>>>>> The `influx` console is just a fancy wrapper on the API. That error
>>>>> doesn't mean much except that the HTTP listener in InfluxDB is not yet up
>>>>> and running.
>>>>>
>>>>>
>>>>>>
>>>>>> Although I can see that influxd is up an running:
>>>>>>
>>>>>> > systemctl | grep influx
>>>>>> influxdb.service
>>>>>>                     loaded active running   InfluxDB is an open-source,
>>>>>> distributed, time series database
>>>>>>
>>>>>> What do I do now?
>>>>>>
>>>>>
>>>>> Check the logs as referenced above.
>>>>>
>>>>> The non-responsiveness on startup isn't surprising. It sounds like the
>>>>> system was overwhelmed with writes, which means that the WAL would have
>>>>> many points cached, waiting to be flushed to disk. On restart, InfluxDB
>>>>> won't accept new writes or queries until the cached ones in the WAL have
>>>>> persisted. For this reason, the HTTP listener is off until the WAL is
>>>>> flushed.
>>>>>
>>>>>
>>>>>>
>>>>>> I tried the same import over the weekend, then the script timeout
>>>>>> happened eventually but the result was the same unresponsive, unusable
>>>>>> server. We rebuilt the box and started again.
>>>>>>
>>>>>
>>>>> It sounds like the box is just overwhelmed. Did you get backoff
>>>>> messages from the writes before the crash? What are the machine specs?
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Perhaps it is worthwhile mentioning that the same measurement already
>>>>>> contained about 9 million records. Some of these records had the same
>>>>>> timestamp as the ones I tried to import, i.e. they should have been 
>>>>>> merged.
>>>>>>
>>>>>
>>>>> Overwriting points is much much more expensive than posting new
>>>>> points. Each overwritten point triggers a tombstone record which must 
>>>>> later
>>>>> be processed. This can trigger frequent compactions of the TSM files. With
>>>>> a high write load and frequent compactions, the system would encounter
>>>>> significant CPU pressure.
>>>>>
>>>>>
>>>>>>
>>>>>> Interestingly enough the same amount of data was fine when I forgot
>>>>>> to add precision in ms, i.e. all records were imported as nanoseconds, 
>>>>>> but
>>>>>> in fact they "lacked" 6 zeroes.
>>>>>>
>>>>>
>>>>> That would mean all points are going to the same shard. It is more
>>>>> resource intensive to load points across a wide range of time, since more
>>>>> shard files are involved. InfluxDB does best with sequential
>>>>> chronologically ordered unique points from the very recent past. The more
>>>>> the write operation differs from that, the lower the throughput.
>>>>>
>>>>>
>>>>>>
>>>>>> Please advise what kind of action I can take.
>>>>>>
>>>>>
>>>>> Look in the logs for errors. Throttle the writes. Don't overwrite more
>>>>> points than you have to.
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks a lot!
>>>>>> Tanya
>>>>>>
>>>>>> --
>>>>>> Remember to include the InfluxDB version number with all issue reports
>>>>>> ---
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "InfluxDB" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> Visit this group at https://groups.google.com/group/influxdb.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/influxdb/f4ebdb56-32f9-4fb
>>>>>> 6-88de-f7ef603c4262%40googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean Beckett
>>>>> Director of Support and Professional Services
>>>>> InfluxDB
>>>>>
>>>>> --
>>>>> Remember to include the version number!
>>>>> ---
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "InfluxData" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>>>> pic/influxdb/sZIR8wY_v4g/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/influxdb.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/influxdb/CALGqCvMCu%3DM9eR
>>>>> 5NOky-LRAiqRU5cnCDJa0SBjRrz5_Wt0tT8g%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/influxdb/CALGqCvMCu%3DM9eR5NOky-LRAiqRU5cnCDJa0SBjRrz5_Wt0tT8g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>> Remember to include the version number!
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "InfluxData" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/influxdb.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/influxdb/CAAHSRnBqEwgJ3npvUHGcVc-J4dFWs5bwvYAU3P84xKsh1C
>>>> 0yBA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/influxdb/CAAHSRnBqEwgJ3npvUHGcVc-J4dFWs5bwvYAU3P84xKsh1C0yBA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>> Sean Beckett
>>> Director of Support and Professional Services
>>> InfluxDB
>>>
>>> --
>>> Remember to include the version number!
>>> ---
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "InfluxData" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>> pic/influxdb/sZIR8wY_v4g/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/influxdb.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/influxdb/CALGqCvNrjBpLPShu1_8zRvJ5d-zmO0%2BH%2ByTFJEVDDG
>>> kdM_J6Lw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/influxdb/CALGqCvNrjBpLPShu1_8zRvJ5d-zmO0%2BH%2ByTFJEVDDGkdM_J6Lw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
> --
> Remember to include the version number!
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/influxdb/CAAHSRnCXfoK%2BeSQZUMg73SFgttoTpNVUQkzYHkSi
> x0RuL%2BL7-Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/influxdb/CAAHSRnCXfoK%2BeSQZUMg73SFgttoTpNVUQkzYHkSix0RuL%2BL7-Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Sean Beckett
Director of Support and Professional Services
InfluxDB

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CALGqCvNQe8CRiHO-iUYq1PJeN5rysZbVdBcRwXtHadkvGW%2BVGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Internal server error, timeout and unusable server after large imports

Reply via email to