Elasticsearch throttles merges by default so that they don't slow search
down too much. This is usually preferable for read/writes loads, but in
your case it looks like you batch-indexed a lot of documents at once and
merges couldn't keep up with the indexing rate so you ended up with a very
high number of segments. The thing is that merge throttling also applies to
optimize calls, which might explain why your calls to the optimize API last
forever.

Could you try to disable merge throttling[1] before running a call to
optimize again to see if the situation improves?

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-store.html#store-throttling


On Fri, Apr 4, 2014 at 5:27 PM, Elliott Bradshaw <[email protected]>wrote:

> Any thoughts on this?  I've run optimize several more times, and the
> number of segments falls each time, but I'm still over 1000 segments per
> shard.  Has anyone else run into something similar?
>
>
> On Thursday, April 3, 2014 11:21:29 AM UTC-4, Elliott Bradshaw wrote:
>>
>> OK.  Optimize finally returned, so I suppose something was happening in
>> the background, but I'm still seeing over 6500 segments.  Even after
>> setting max_num_segments=5.  Does this seem right?  Queries are a little
>> faster (350-400ms) but still not great.  Bigdesk is still showing a fair
>> amount of file IO.
>>
>> On Thursday, April 3, 2014 8:47:32 AM UTC-4, Elliott Bradshaw wrote:
>>>
>>> Hi All,
>>>
>>> I've recently upgraded to Elasticsearch 1.1.0.  I've got a 4 node
>>> cluster, each with 64G of ram, with 24G allocated to Elasticsearch on
>>> each.  I've batch loaded approximately 86 million documents into a single
>>> index (4 shards) and have started benchmarking cross_field/multi_match
>>> queries on them.  The index has one replica and takes up a total of 111G.
>>> I've run several batches of warming queries, but queries are not as fast as
>>> I had hoped, approximately 400-500ms each.  Given that *top *(on
>>> Centos) shows 5-8 GB of free memory on each server, I would assume that the
>>> entire index has been paged into memory (I had worried about disk
>>> performance previously, as we are working in a virtualized environment).
>>>
>>> A stats query on the index in questions shows that the index is composed
>>> of > 7000 segments.  This seemed high to me, but maybe it's appropriate.
>>> Regardless, I dispatched an optimize command, but I am not seeing any
>>> progress and the command has not returned.  Current merges remains at zero,
>>> and the segment count is not changing.  Checking out hot threads in
>>> ElasticHQ, I initially saw an optimize call in the stack that was blocked
>>> on a waitForMerge call.  This however has disappeared, and I'm seeing no
>>> evidence that the optimize is occuring.
>>>
>>> Does any of this seem out of the norm or unusual?  Has anyone else had
>>> similar issues.  This is the second time I have tried to optimize an index
>>> since upgrading.  I've gotten the same result both time.
>>>
>>> Thanks in advance for any help/tips!
>>>
>>> - Elliott
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5391291f-5c5e-4088-a1f2-93272beef0bb%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/5391291f-5c5e-4088-a1f2-93272beef0bb%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6WMRx8x-rJJi3KS2CZUu9wSbX8Vmuy48CpHFM_jUCXdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to