Timothy Potter created SOLR-8226:
------------------------------------

             Summary: Is a SocketTimeoutException really a reliable indicator 
of a zombie in LBHttpSolrClient?
                 Key: SOLR-8226
                 URL: https://issues.apache.org/jira/browse/SOLR-8226
             Project: Solr
          Issue Type: Improvement
          Components: SolrJ
            Reporter: Timothy Potter
            Assignee: Timothy Potter


In LBHttpSolrClient, we do:

{code}
 } catch (SocketTimeoutException e) {
395           if (!isUpdate) {
396             ex = (!isZombie) ? addZombie(client, e) : e;
397           } else {
398             throw e;
399           }
{code}

If I have a reasonably low socket timeout configured for my 
HttpShardHandlerFactory and we hit a slow query, then a perfectly healthy 
replica gets put into the zombie list, and potentially creating a herd effect 
on my other replicas as there is now one less replica in the rotation. 
Moreover, HttpShardHandlerFactory does not let me configure the check interval 
for adding zombies back into rotation, so a potentially healthy replica is out 
of rotation for a full minute. At the very least, the interval should be 
configurable for the HttpShardHandlerFactory, but we should also strive to 
differentiate between a slow response and a true zombie.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to