Re: Upper limits on indexes/shards in a cluster

David Ashby Tue, 21 Oct 2014 12:30:12 -0700

Hmm, maybe. We are using the Elastica PHP library and call 
getStatus()->getServerStatus() 
relatively often (to try and work around elastica's lack of proper error 
handling of unreachable nodes) to determine if we have a node we can 
connect to or not. If that call maps to IndicesStatusRequest in the end we 
might be shooting ourselves in the foot.


On Tuesday, October 21, 2014 3:25:10 PM UTC-4, Jörg Prante wrote:
>
> Maybe you are hit by 
> https://github.com/elasticsearch/elasticsearch/issues/7385
>
> Jörg
>
> On Tue, Oct 21, 2014 at 9:17 PM, [email protected] <javascript:> <
> [email protected] <javascript:>> wrote:
>
>> This has nothing to do with OpenJDK.
>>
>> IndicesStatusRequest (deprecated, will be removed from future versions) 
>> is a heavy request, there may be something on your machines which takes 
>> longer than 5 seconds, so the request times out.
>>
>> The IndicesStatus action uses Directories.estimateSize of Lucene. This 
>> call might take some time on large directories, maybe you have many 
>> segments/unoptimized shards/indices.
>>
>> Jörg
>>
>> On Tue, Oct 21, 2014 at 6:21 PM, David Ashby <[email protected] 
>> <javascript:>> wrote:
>>
>>> I should also note that I've been using OpenJDK. I'm currently in the 
>>> process of moving to the official Oracle binaries; are there specific 
>>> optimizations changes there that help with inter-cluster IO? There's some 
>>> hints at that in this very old github-elasticsearch interview 
>>> <http://exploringelasticsearch.com/github_interview.html>.
>>>
>>>
>>> On Monday, October 20, 2014 3:49:39 PM UTC-4, David Ashby wrote:
>>>>
>>>> example log line: [DEBUG][action.admin.indices.status] [Red Ronin] 
>>>> [*index*][1], node[t60FJtJ-Qk-dQNrxyg8faA], [R], s[STARTED]: failed to 
>>>> executed 
>>>> [org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@36239161]
>>>>  
>>>> org.elasticsearch.transport.NodeDisconnectedException: 
>>>> [Shotgun][inet[/IP:9300]][indices/status/s] disconnected
>>>>
>>>> When the cluster gets into this state, all requests hang waiting for... 
>>>> something to happen. Each individual node returns 200 when curled locally. 
>>>> A huge number of this above log line appear at the end of this process -- 
>>>> one for every single shard on the node, which is a huge vomit into my 
>>>> logs. 
>>>> As soon as a node is restarted the cluster "snaps back" and immediately 
>>>> fails outstanding requests and begins rebalancing. It even stops 
>>>> responding 
>>>> to bigdesk requests.
>>>>
>>>> On Monday, October 20, 2014 11:34:36 AM UTC-4, David Ashby wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We've been using elasticsearch on AWS for our application for two 
>>>>> purposes: as a search engine for user-created documents, and as a cache 
>>>>> for 
>>>>> activity feeds in our application. We made a decision early-on to treat 
>>>>> every customer's content as a distinct index, for full logical separation 
>>>>> of customer data. We have about three hundred indexes in our cluster, 
>>>>> with 
>>>>> the default 5-shards/1-replica setup.
>>>>>
>>>>> Recently, we've had major problems with the cluster "locking up" to 
>>>>> requests and losing track of its nodes. We initially responded by 
>>>>> attempting to remove possible CPU and memory limits, and placed all nodes 
>>>>> in the same AWS placement group, to maximize inter-node bandwidth, all to 
>>>>> no avail. We eventually lost an entire production cluster, resulting in a 
>>>>> decision to split the indexes across two completely independent clusters, 
>>>>> each cluster taking half of the indexes, with application-level logic 
>>>>> determining where the indexes were.
>>>>>
>>>>> All that is to say: with our setup, are we running into an 
>>>>> undocumented *practical* limit on the number of indexes or shards in 
>>>>> a cluster? It ends up being around 3000 shards with our setup. Our logs 
>>>>> show evidence of nodes timing out their responses to massive shard 
>>>>> status-checks, and it gets *worse* the more nodes there are in the 
>>>>> cluster. It's also stable with only *two* nodes.
>>>>>
>>>>> Thanks,
>>>>> -David
>>>>>
>>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/7046bd31-5c8a-4e33-9ab4-97cdd8bfd436%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/7046bd31-5c8a-4e33-9ab4-97cdd8bfd436%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/51189efa-e739-4e6e-9311-5c7126a28b03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Upper limits on indexes/shards in a cluster

Reply via email to