Re: Bulk indexing creates a lot of disk read OPS

David Pilato Fri, 24 Apr 2015 02:28:58 -0700

That’s normal. I was just answering that even if you think you are only writing 
data while indexing, you are also reading data behind the scene to merge Lucene 
segments.
You can potentially try to play with index.translog.flush_threshold_size


http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html
 
<http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-translog.html>

And increase the transaction log size?

It might help reducing the number of segments generated but that said you will 
always have READs operations.

Actually, is it an issue for you? If not, keeping all defaults values might be 
good.

Best


-- 
David Pilato - Developer | Evangelist 
elastic.co
@dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
<https://twitter.com/elasticsearchfr> | @scrutmydocs 
<https://twitter.com/scrutmydocs>





> Le 24 avr. 2015 à 10:45, Eran <era...@gmail.com> a écrit :
> 
> Hey David,
> 
> I suspect it indeed might be the cause, but I'm kind of a newbie here. 
> What metric do I need to monitor, what would be a problematic value, and 
> basically, how can I play with merge settings to test if I can improve this?
> Some rules of thumbs for a newbie would be appreciated.
> 
> I installed the plugin SegmentSpy, and here is a screenshot, if that helps.
> 
> Eran
> 
> On Friday, April 24, 2015 at 11:02:27 AM UTC+3, David Pilato wrote:
> Merging segments could be the cause here?
> 
> David
> 
> Le 24 avr. 2015 à 09:54, Eran <era...@gmail.com <javascript:>> a écrit :
> 
>> Forgot some stats:
>> 
>> I have 10 shards, no replicas, all on the same machine.
>> ATM, there are some 1.5 billion records in the index.
>> 
>> 
>> On Friday, April 24, 2015 at 10:18:27 AM UTC+3, Eran wrote:
>> attachments hereby
>> 
>> On Friday, April 24, 2015 at 9:49:56 AM UTC+3, Eran wrote:
>> Hello,
>> 
>> I've created an index I use for logging.
>> 
>> This means there are mostly writes, and some searches once in a while.
>> In the phase of the first loading, I'm using several clients to concurrently 
>> index documents using the bulk API.
>> 
>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>> 
>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with an 
>> IO provisioned volume set to 7000 IOPS.
>> 
>> Looking at the metrics, I see that the CPU and memory are fine, the write 
>> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>> 
>> How come I'm only indexing, but most of the IOPS are read?
>> 
>> I am attaching some screen captures from the BigDesk plugin, that show the 
>> two states of the index, ater about 20% of the graphs is the point in time 
>> where I stopped the clients, so you can see the load drop of.
>> 
>> My settings are:
>> 
>> threadpool.bulk.type: fixed
>> threadpool.bulk.size: 32                 # availableProcessors
>> threadpool.bulk.queue_size: 1000
>> 
>> # Indices settings
>> indices.memory.index_buffer_size: 50%
>>                                                                              
>>                                                                              
>>        376,1         97%
>> indices.cache.filter.expire: 6h
>> 
>> bootstrap.mlockall: true
>> 
>> 
>> and I've change the index settings to:
>> 
>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"50000"}}}
>> I also tried "refresh_interval":"-1"
>> 
>> 
>> Please let me know what else I need to provide if needed (settings, logs, 
>> metrics)
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/a64e78f3-5d69-4ca1-a3c9-86735a25343d%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> <mailto:elasticsearch+unsubscr...@googlegroups.com>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/dd232398-080a-488c-a952-b98c2a6da903%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> <Screen Shot 2015-04-24 at 11.42.16.png>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/C2CCCCA1-C204-43D7-A7BE-AD885AB8298A%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

Reply via email to