Yep, turns out that calls _status on the entire cluster every time it runs. That might get... uncomfortable. We're submitting a bug report to Elastica to at the very least get them to update their documentation to mark that codepath as deprecated.
On Tuesday, October 21, 2014 3:29:47 PM UTC-4, David Ashby wrote: > > Hmm, maybe. We are using the Elastica PHP library and call > getStatus()->getServerStatus() > relatively often (to try and work around elastica's lack of proper error > handling of unreachable nodes) to determine if we have a node we can > connect to or not. If that call maps to IndicesStatusRequest in the end > we might be shooting ourselves in the foot. > > On Tuesday, October 21, 2014 3:25:10 PM UTC-4, Jörg Prante wrote: >> >> Maybe you are hit by >> https://github.com/elasticsearch/elasticsearch/issues/7385 >> >> Jörg >> >> On Tue, Oct 21, 2014 at 9:17 PM, [email protected] <[email protected]> >> wrote: >> >>> This has nothing to do with OpenJDK. >>> >>> IndicesStatusRequest (deprecated, will be removed from future versions) >>> is a heavy request, there may be something on your machines which takes >>> longer than 5 seconds, so the request times out. >>> >>> The IndicesStatus action uses Directories.estimateSize of Lucene. This >>> call might take some time on large directories, maybe you have many >>> segments/unoptimized shards/indices. >>> >>> Jörg >>> >>> On Tue, Oct 21, 2014 at 6:21 PM, David Ashby <[email protected]> >>> wrote: >>> >>>> I should also note that I've been using OpenJDK. I'm currently in the >>>> process of moving to the official Oracle binaries; are there specific >>>> optimizations changes there that help with inter-cluster IO? There's some >>>> hints at that in this very old github-elasticsearch interview >>>> <http://exploringelasticsearch.com/github_interview.html>. >>>> >>>> >>>> On Monday, October 20, 2014 3:49:39 PM UTC-4, David Ashby wrote: >>>>> >>>>> example log line: [DEBUG][action.admin.indices.status] [Red Ronin] >>>>> [*index*][1], node[t60FJtJ-Qk-dQNrxyg8faA], [R], s[STARTED]: failed to >>>>> executed >>>>> [org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@36239161] >>>>> >>>>> org.elasticsearch.transport.NodeDisconnectedException: >>>>> [Shotgun][inet[/IP:9300]][indices/status/s] disconnected >>>>> >>>>> When the cluster gets into this state, all requests hang waiting >>>>> for... something to happen. Each individual node returns 200 when curled >>>>> locally. A huge number of this above log line appear at the end of this >>>>> process -- one for every single shard on the node, which is a huge vomit >>>>> into my logs. As soon as a node is restarted the cluster "snaps back" and >>>>> immediately fails outstanding requests and begins rebalancing. It even >>>>> stops responding to bigdesk requests. >>>>> >>>>> On Monday, October 20, 2014 11:34:36 AM UTC-4, David Ashby wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> We've been using elasticsearch on AWS for our application for two >>>>>> purposes: as a search engine for user-created documents, and as a cache >>>>>> for >>>>>> activity feeds in our application. We made a decision early-on to treat >>>>>> every customer's content as a distinct index, for full logical >>>>>> separation >>>>>> of customer data. We have about three hundred indexes in our cluster, >>>>>> with >>>>>> the default 5-shards/1-replica setup. >>>>>> >>>>>> Recently, we've had major problems with the cluster "locking up" to >>>>>> requests and losing track of its nodes. We initially responded by >>>>>> attempting to remove possible CPU and memory limits, and placed all >>>>>> nodes >>>>>> in the same AWS placement group, to maximize inter-node bandwidth, all >>>>>> to >>>>>> no avail. We eventually lost an entire production cluster, resulting in >>>>>> a >>>>>> decision to split the indexes across two completely independent >>>>>> clusters, >>>>>> each cluster taking half of the indexes, with application-level logic >>>>>> determining where the indexes were. >>>>>> >>>>>> All that is to say: with our setup, are we running into an >>>>>> undocumented *practical* limit on the number of indexes or shards in >>>>>> a cluster? It ends up being around 3000 shards with our setup. Our logs >>>>>> show evidence of nodes timing out their responses to massive shard >>>>>> status-checks, and it gets *worse* the more nodes there are in the >>>>>> cluster. It's also stable with only *two* nodes. >>>>>> >>>>>> Thanks, >>>>>> -David >>>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/7046bd31-5c8a-4e33-9ab4-97cdd8bfd436%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/elasticsearch/7046bd31-5c8a-4e33-9ab4-97cdd8bfd436%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c4db397d-90f0-45bd-8f1e-b27581c7fa67%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
