The unhealthy clusters were between four and five nodes. We switched to two 
two-node clusters and those have been stable.

Bigdesk reports file descriptors, memory, and CPU all have plentiful 
headroom in all cases.

On Monday, October 20, 2014 11:54:21 AM UTC-4, Jörg Prante wrote:
>
> How many nodes do you have in your cluster? 
>
> Have you checked if your nodes run out of file descriptors or heap memory?
>
> Jörg
>
> On Mon, Oct 20, 2014 at 5:52 PM, David Ashby <[email protected] 
> <javascript:>> wrote:
>
>> I might also note: the size of these indexes varies wildly, some being 
>> just a few documents, some being thousands, more or less following the 
>> power law.
>>
>>
>> On Monday, October 20, 2014 11:34:36 AM UTC-4, David Ashby wrote:
>>>
>>> Hi,
>>>
>>> We've been using elasticsearch on AWS for our application for two 
>>> purposes: as a search engine for user-created documents, and as a cache for 
>>> activity feeds in our application. We made a decision early-on to treat 
>>> every customer's content as a distinct index, for full logical separation 
>>> of customer data. We have about three hundred indexes in our cluster, with 
>>> the default 5-shards/1-replica setup.
>>>
>>> Recently, we've had major problems with the cluster "locking up" to 
>>> requests and losing track of its nodes. We initially responded by 
>>> attempting to remove possible CPU and memory limits, and placed all nodes 
>>> in the same AWS placement group, to maximize inter-node bandwidth, all to 
>>> no avail. We eventually lost an entire production cluster, resulting in a 
>>> decision to split the indexes across two completely independent clusters, 
>>> each cluster taking half of the indexes, with application-level logic 
>>> determining where the indexes were.
>>>
>>> All that is to say: with our setup, are we running into an undocumented 
>>> *practical* limit on the number of indexes or shards in a cluster? It 
>>> ends up being around 3000 shards with our setup. Our logs show evidence of 
>>> nodes timing out their responses to massive shard status-checks, and it 
>>> gets *worse* the more nodes there are in the cluster. It's also stable 
>>> with only *two* nodes.
>>>
>>> Thanks,
>>> -David
>>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/17720132-eb50-4d49-bae5-8970e39b79dc%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/17720132-eb50-4d49-bae5-8970e39b79dc%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dc2d873d-28ed-40c9-94d9-eb1da37d1caa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to