Re: Bulk indexing creates a lot of disk read OPS

Eran Fri, 24 Apr 2015 04:55:03 -0700

Wow, awsome. I'll try that, Thanks!

On Friday, April 24, 2015 at 2:17:45 PM UTC+3, 
christian...@elasticsearch.com wrote:
>
> Hi Eran,
>
> If you are assigning your own ID, Elasticsearch need to search and check 
> if the document already exists before writing it. This could explain why 
> the bulk insert performance goes down as the size of the index grows. If 
> you are not going to update the documents, I would therefore recommend 
> allowing Elasticsearch to assign the document ID automatically.
>
> Best regards,
>
> Christian
>
>
>
> On Friday, April 24, 2015 at 7:49:56 AM UTC+1, Eran wrote:
>>
>> Hello,
>>
>> I've created an index I use for logging.
>>
>> This means there are mostly writes, and some searches once in a while.
>> In the phase of the first loading, I'm using several clients to 
>> concurrently index documents using the bulk API.
>>
>> At first, indexing takes 200 ms for a bulk of 5000 documents.
>> As time goes by, the indexing time increases, and gets to 1000-4500 ms.
>>
>> I am using an EC2 c3.8xl machine with 32 cores, and 60 GB of memory, with 
>> an IO provisioned volume set to 7000 IOPS.
>>
>> Looking at the metrics, I see that the CPU and memory are fine, the write 
>> IOPS are at 300, but the read IOPS have slowly gone up and got to 7000.
>>
>> How come I'm only indexing, but most of the IOPS are read?
>>
>> I am attaching some screen captures from the BigDesk plugin, that show 
>> the two states of the index, ater about 20% of the graphs is the point in 
>> time where I stopped the clients, so you can see the load drop of.
>>
>> My settings are:
>>
>> threadpool.bulk.type: fixed
>> threadpool.bulk.size: 32                 # availableProcessors
>> threadpool.bulk.queue_size: 1000
>>
>> # Indices settings
>> indices.memory.index_buffer_size: 50%
>>                                                                           
>>                                                                             
>>            376,1         97%
>> indices.cache.filter.expire: 6h
>>
>> bootstrap.mlockall: true
>>
>>
>> and I've change the index settings to:
>>
>>
>> {"index":{"refresh_interval":"60m","translog":{"flush_threshold_size":"1gb","flush_threshold_ops":"50000"}}}
>> I also tried "refresh_interval":"-1"
>>
>>
>> Please let me know what else I need to provide if needed (settings, logs, 
>> metrics)
>>
>>


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/84687c05-49a5-4e0a-9a4f-41e4136a120a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Bulk indexing creates a lot of disk read OPS

Reply via email to