Re: Bulk indexing creates a lot of disk read OPS

Eran Fri, 24 Apr 2015 01:46:16 -0700

Hey David,

I suspect it indeed might be the cause, but I'm kind of a newbie here. 
What metric do I need to monitor, what would be a problematic value, and 
basically, how can I play with merge settings to test if I can improve this?
Some rules of thumbs for a newbie would be appreciated.


I installed the plugin SegmentSpy, and here is a screenshot, if that helps.

Eran

On Friday, April 24, 2015 at 11:02:27 AM UTC+3, David Pilato wrote:
>
> Merging segments could be the cause here?
>
> David
>
> Le 24 avr. 2015 à 09:54, Eran <era...@gmail.com <javascript:>> a écrit :
>
> Forgot some stats:
>
> I have 10 shards, no replicas, all on the same machine.
> ATM, there are some 1.5 billion records in the index.
>
>
> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>>
>> attachments hereby
>>
>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:
>>>
>>> Hello,
>>>
>>> I've created an index I use for logging.
>>>
>>> This means there are mostly writes, and some searches once in a while.
>>> In the phase of the first loading, I'm using several clients to 
>>> concurrently index documents using the bulk API.
>>>
>>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>>
>>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, 
>>> with an IO provisioned volume set to 7000 IOPS.
>>>
>>> Looking at the metrics, I see that the CPU and memory are fine, the 
>>> write IOPS are at 300, but the read IOPS have slowly gone up and got to 
>>> 7000.
>>>
>>> How come I'm only indexing, but most of the IOPS are read?
>>>
>>> I am attaching some screen captures from the BigDesk plugin, that show 
>>> the two states of the index, ater about 20% of the graphs is the point in 
>>> time where I stopped the clients, so you can see the load drop of.
>>>
>>> My settings are:
>>>
>>> threadpool.bulk.type: fixed
>>> threadpool.bulk.size: 32                 # availableProcessors
>>> threadpool.bulk.queue_size: 1000
>>>
>>> # Indices settings
>>> indices.memory.index_buffer_size: 50%
>>>                                                                         
>>>                                                                             
>>>              376,1         97%
>>> indices.cache.filter.expire: 6h
>>>
>>> bootstrap.mlockall: true
>>>
>>>
>>> and I've change the index settings to:
>>>
>>>
>>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"50000"}}}
>>> I also tried "refresh_interval":"-1"
>>>
>>>
>>> Please let me know what else I need to provide if needed (settings, 
>>> logs, metrics)
>>>
>>>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

Reply via email to