Re: [influxdb] writes causing OOM killer to kill influx after 3-6 minutes.

Jeffery K Tue, 31 Jan 2017 09:25:16 -0800

Does it make sense that this could be because everything is in a different 
retention policy? I guess this creates different WAL files, and thus, 
increases the memory requirement on the influx side. When I went back to 
writing everything to autogen, memory usage was greatly reduced on the 
influxd process. I tried lowering the point at which cache is flush to disk 
(from the default 25MB to 15MB) but that didn't help too much. (but it did 
help some)


On Monday, January 30, 2017 at 11:54:38 AM UTC-5, Jeffery K wrote:
>
> Also, I'm not sure if it matters, but each of these separate measurements 
> (up to around 450 now, goal is 1000) is being written to it's own retention 
> policy. So, I'm not sure if that factors into increased memory usage on 
> influxdb. 
>
> On Monday, January 30, 2017 at 10:54:09 AM UTC-5, Jeffery K wrote:
>>
>> Ok, so changing the batch writing to happen less frequently had a large 
>> positive effect, but it still climbs in memory usage, until it runs out 
>> after 20-25 minutes. This is when testing against the latest, which has 
>> "50% better write performance". 
>> Any thoughts on other tuning parameters, to let influx release memory 
>> sooner, or take less memory overall?  I would think 20GB is enough for this 
>> load (. Attached is the memory profile of the process on my last run. 
>> The left axis is megabytes. When the line stops, it's because the process 
>> was kill by OOM killer. 
>>
>>
>> On Friday, January 27, 2017 at 10:59:34 AM UTC-5, Jeffery K wrote:
>>>
>>> Only 1 tag per measurement, which I'm not even really using, so I can 
>>> probably do away with that one. It's just a string that is always the same 
>>> for all inserts. It's a unique identifier for the measurement, but that 
>>> same string is in the measurement name, so having a tag with it is a little 
>>> redundant, but the cardinality shouldn't be affected by it, since the value 
>>> of that tag is always the same. 
>>> This morning, as I responded the the post earlier, I thought that 
>>> perhaps I should increase the time between batches, as decreasing it might 
>>> actually put *more* load on influx, but granted, each batch will now 
>>> have more data. So, less batches, but each batch will have more data. I 
>>> changed the thread (per measurement) to only send every 5 seconds, and so 
>>> far, the memory foot print is smaller on influxdb. It's been running for 8 
>>> minutes, and is only using 14.4GB, but is climbing at a steady rate... 
>>> maybe it will be able to handle things this way... I'll do some more 
>>> testing and perhaps add more inserts, since i'm trying to get to 1000 
>>> measurements with this same insert load. 
>>>
>>> On Friday, January 27, 2017 at 10:47:07 AM UTC-5, Paul Dix wrote:
>>>>
>>>> The concurrency is fine. Should be easy for it to handle. Are you using 
>>>> tags?
>>>>
>>>> On Fri, Jan 27, 2017 at 10:39 AM, Jeffery K <
>>>> jeffer...@sightlinesystems.com> wrote:
>>>>
>>>>> I checked the schema of one of the measurements (they should all be 
>>>>> the same as we're inserting the same data into multiple measurements). 
>>>>> There are 74 field keys. All float type. the field key names are just 
>>>>> strings, some with spaces, and sometimes characters like % or 
>>>>> parenthesis. 
>>>>> (). But we've done the proper escaping to do the insert into influx 
>>>>> correctly. The line feed protocol is correct as far as I know, since it's 
>>>>> accepted, and the fields & data look correct. 
>>>>> We are using a separate thread for each of the 230 (want to get to 
>>>>> 1000, but only got this many) measurements for doing the insert. Is there 
>>>>> a 
>>>>> problem with this level of concurrency with influx? to elaborate, for 
>>>>> each 
>>>>> measurement, our client is creating a http writer (in java) and feeding 
>>>>> in 
>>>>> a batch of writes that have been read and put into line feed protocol in 
>>>>> the last 100ms (we initially had this at 500ms, but to reduce load to 
>>>>> influx, lowered it). After each batch "commit", we re-create the http 
>>>>> connection, for the next batch insert. 
>>>>>
>>>>> I initially had problems testing on the newer influx 1.2 since it 
>>>>> seems the json of returns has changed to have a statement_id, and my 
>>>>> jackson converter to java object didn't like that, but I've since fixed 
>>>>> that and re-tested this scenario with Influx 1.2, and it behaves the same 
>>>>> way. gets killed very quickly by the OOM killer, using around 22GB of 
>>>>> RAM. 
>>>>>
>>>>>
>>>>>
>>>>> On Friday, January 27, 2017 at 10:06:04 AM UTC-5, Paul Dix wrote:
>>>>>>
>>>>>> My guess is there's an issue with your schema. What does the data 
>>>>>> look like?
>>>>>>
>>>>>> On Thu, Jan 26, 2017 at 6:34 PM, Jeffery K <
>>>>>> jeffer...@sightlinesystems.com> wrote:
>>>>>>
>>>>>>> We are using http to write inserts to the /write endpoint and after 
>>>>>>> a very short duration, influx is being kill by the linux OS for out of 
>>>>>>> memory. 
>>>>>>> We've increased the memory 4 fold from the initial value, and influx 
>>>>>>> now has 20GB, but still it is crashing fairly quickly. We are 
>>>>>>> performing 
>>>>>>> about only 230,000 inserts. 1000 inserts into 230 measurements. We are 
>>>>>>> using batching and flushing every 100 milliseconds. 
>>>>>>>
>>>>>>> We don't believe the data is more than 1GB, but yet, 20 GB of memory 
>>>>>>> isn't enough to write this?!
>>>>>>>
>>>>>>> This was done with 1.1.0 version. 
>>>>>>>
>>>>>>> -- 
>>>>>>> Remember to include the version number!
>>>>>>> --- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "InfluxData" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to influxdb+u...@googlegroups.com.
>>>>>>> To post to this group, send email to infl...@googlegroups.com.
>>>>>>> Visit this group at https://groups.google.com/group/influxdb.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/influxdb/67ec84d0-4320-4044-b881-1b23839b2964%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/influxdb/67ec84d0-4320-4044-b881-1b23839b2964%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>> Remember to include the version number!
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "InfluxData" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to influxdb+u...@googlegroups.com.
>>>>> To post to this group, send email to infl...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/influxdb.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/influxdb/25f852c4-2b95-4d38-a250-941c55f54a88%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/influxdb/25f852c4-2b95-4d38-a250-941c55f54a88%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>

-- 
Remember to include the version number!
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxData" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/4bcb9113-7f75-409d-b425-2970f401489f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] writes causing OOM killer to kill influx after 3-6 minutes.

Reply via email to