I see. Thanks. I wasnt expecting to have to check for shard failures. Would it make sense to have a config or request setting whereby any shard failures are reported as exceptions (java api). In the space where we have been using elastic, partial results are bad. We would end up having to always write code to turn shard failures into exceptions....
-Nick On Friday, 20 June 2014 13:26:35 UTC+1, kimchy wrote: > > Ahh, I see. If its related to searches, then yea, the search response > includes details about the total shards that the search was executed on, > the successful shards, and failed shards. They are important to check to > understand if one gets partial results. > > In the REST API, if there is a total “failure”, then it will return the > “worst” status code out of all the shards in the response. In the Java API, > the search response will be returned (with no exception), so the content of > the search has to be checked (which is a good practice anyhow). It might > make sense to raise an exception in the Java API if all shards failed, I am > on the fence on this one, since anyhow a check needs to be performed on the > result. > > On Jun 20, 2014, at 13:22, Nikolas Everett <[email protected] <javascript:>> > wrote: > > > > > On Fri, Jun 20, 2014 at 7:08 AM, Shay Banon <[email protected] > <javascript:>> wrote: > >> If it fails on the primary shard, then a failure is returned. If it >> worked, and a replica failed, then that replica is deemed a failed replica, >> and will get allocated somewhere else in the cluster. Maybe an example of >> where a failure on “all” shards would help here? >> > > I think its more about searches and they can fail on one shard but not > other for all sorts of reasons. Queue full, unfortunate script, bug, only > one shard had results and the query asked for something weird like to use > the postings highlighter when postings aren't stored. Lots of reasons. > > I log the event and move on. I toyed with outputting a warning to the > user but didn't have time to implement it. We're pretty diligent with our > logs so we'd notice the log and run it down. > > If the failure is caused by the queue being full only on one node, we'd > likely notice that real quick as ganglia would lose it. This happened to > me recently when we put a node without an ssd into a cluster with ssds. It > couldn't keep up and dropped a ton of searches. In our defense, we didn't > know the rest of the cluster had ssds so we were double surprised. > > Nik > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2dvNM-wu%3Due4trJzAtLV%3Dz1xK0MVNxhYkUKv2g68z3VQ%40mail.gmail.com > > <https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2dvNM-wu%3Due4trJzAtLV%3Dz1xK0MVNxhYkUKv2g68z3VQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/63f3b5a8-02b2-4a2f-bb47-b42b46ff4d54%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
