Re: Bulk indexing slow down when data amount increase

Eric Lu Tue, 14 Jan 2014 22:25:28 -0800

I have set the replica to 0 and queue to 50. and it can index about 7 - 8 
millions documents per hour now. It's acceptable . Though i don't know 
which change makes it.


Thank you all.

在 2014年1月13日星期一UTC+8下午9时04分35秒，Eric Lu写道：
>
> I observed the GC occured once every 15 seconds when  heap mem was 75% of 
> the heap size. Is it too frequent? there is no OOMs.
>
> I set refresh interval to 30s. 
>
> I'll try to use a smaller queue and set replica to 0
>
> Thank you.
>
> 在 2014年1月13日星期一UTC+8下午8时42分56秒，Jörg Prante写道：
>>
>> 12 hours is an absurdly long time for indexing 10 million docs.
>>
>> queue:1000 is much too high for production. For test it may be ok (it 
>> effectively disables queue rejections) but on production, you play with the 
>> risk of starving your cluster resources.
>>
>> Do you rmonitor the resource usage of ES, especially the heap? Is GC 
>> starving your cluster? Do you see OOMs?
>>
>> Do you evaluate the bulk responses for errors? Do you throttle bulk 
>> request concurrency? 
>>
>> Do you set refresh interval to -1? 
>>
>> Hint: if 5 nodes is your maximum, you can also bulk index with 5 shards 
>> and replica level 0, after bulk, you can increase replica level to 1.
>>
>> Jörg
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b9fab05-fa3e-455c-b8ba-1253b72c9e46%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Bulk indexing slow down when data amount increase

Reply via email to