Re: Bulk indexing creates a lot of disk read OPS

Eran Fri, 24 Apr 2015 03:31:47 -0700

It is an issue, as I am hitting 7000 read operations per second (the limit 
of my volume's iops)


As the index grow larger the problem worsens, and as I was once able to 
update with a 10 clients concurrently, now I can barely use one client.

Also, I used an _optimize endpoint to have all segments synced, and even 
then, the read operations spike immediately on the first indexing operation 
(I'm using BigDesk to follow this). So I do not think it is a merge effect, 
as my intuition would be a merge happens every once in a while?
Maybe this is actually a result of me not using "doc values"? could that be 
it?

On Friday, April 24, 2015 at 12:28:50 PM UTC+3, David Pilato wrote:

> That’s normal. I was just answering that even if you think you are only 
> writing data while indexing, you are also reading data behind the scene to 
> merge Lucene segments.
> You can potentially try to play with index.translog.flush_threshold_size 
>
>
> http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html
>
> And increase the transaction log size?
>
> It might help reducing the number of segments generated but that said you 
> will always have READs operations.
>
> Actually, is it an issue for you? If not, keeping all defaults values 
> might be good.
>
> Best
>
>
> -- 
> *David Pilato* - Developer | Evangelist 
> *elastic.co <http://elastic.co>*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
> <https://twitter.com/elasticsearchfr> | @scrutmydocs 
> <https://twitter.com/scrutmydocs>
>
>
>
>
>  
> Le 24 avr. 2015 à 10:45, Eran <era...@gmail.com <javascript:>> a écrit :
>
> Hey David,
>
> I suspect it indeed might be the cause, but I'm kind of a newbie here. 
> What metric do I need to monitor, what would be a problematic value, and 
> basically, how can I play with merge settings to test if I can improve this?
> Some rules of thumbs for a newbie would be appreciated.
>
> I installed the plugin SegmentSpy, and here is a screenshot, if that helps.
>
> Eran
>
> On Friday, April 24, 2015 at 11:02:27 AM UTC+3, David Pilato wrote:
>>
>> Merging segments could be the cause here?
>>
>> David
>>
>> Le 24 avr. 2015 à 09:54, Eran <era...@gmail.com> a écrit :
>>
>> Forgot some stats:
>>
>> I have 10 shards, no replicas, all on the same machine.
>> ATM, there are some 1.5 billion records in the index.
>>
>>
>> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>>>
>>> attachments hereby
>>>
>>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've created an index I use for logging.
>>>>
>>>> This means there are mostly writes, and some searches once in a while.
>>>> In the phase of the first loading, I'm using several clients to 
>>>> concurrently index documents using the bulk API.
>>>>
>>>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>>>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>>>
>>>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, 
>>>> with an IO provisioned volume set to 7000 IOPS.
>>>>
>>>> Looking at the metrics, I see that the CPU and memory are fine, the 
>>>> write IOPS are at 300, but the read IOPS have slowly gone up and got to 
>>>> 7000.
>>>>
>>>> How come I'm only indexing, but most of the IOPS are read?
>>>>
>>>> I am attaching some screen captures from the BigDesk plugin, that show 
>>>> the two states of the index, ater about 20% of the graphs is the point in 
>>>> time where I stopped the clients, so you can see the load drop of.
>>>>
>>>> My settings are:
>>>>
>>>> threadpool.bulk.type: fixed
>>>> threadpool.bulk.size: 32                 # availableProcessors
>>>> threadpool.bulk.queue_size: 1000
>>>>
>>>> # Indices settings
>>>> indices.memory.index_buffer_size: 50%
>>>>                                                                         
>>>>                                                                            
>>>>  
>>>>              376,1         97%
>>>> indices.cache.filter.expire: 6h
>>>>
>>>> bootstrap.mlockall: true
>>>>
>>>>
>>>> and I've change the index settings to:
>>>>
>>>>
>>>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"50000"}}}
>>>> I also tried "refresh_interval":"-1"
>>>>
>>>>
>>>> Please let me know what else I need to provide if needed (settings, 
>>>> logs, metrics)
>>>>
>>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
> <Screen Shot 2015-04-24 at 11.42.16.png>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/854e4e94-32ce-4fcd-9dba-7a0e57923b82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

Reply via email to