Timothy Potter created SOLR-8226: ------------------------------------ Summary: Is a SocketTimeoutException really a reliable indicator of a zombie in LBHttpSolrClient? Key: SOLR-8226 URL: https://issues.apache.org/jira/browse/SOLR-8226 Project: Solr Issue Type: Improvement Components: SolrJ Reporter: Timothy Potter Assignee: Timothy Potter
In LBHttpSolrClient, we do: {code} } catch (SocketTimeoutException e) { 395 if (!isUpdate) { 396 ex = (!isZombie) ? addZombie(client, e) : e; 397 } else { 398 throw e; 399 } {code} If I have a reasonably low socket timeout configured for my HttpShardHandlerFactory and we hit a slow query, then a perfectly healthy replica gets put into the zombie list, and potentially creating a herd effect on my other replicas as there is now one less replica in the rotation. Moreover, HttpShardHandlerFactory does not let me configure the check interval for adding zombies back into rotation, so a potentially healthy replica is out of rotation for a full minute. At the very least, the interval should be configurable for the HttpShardHandlerFactory, but we should also strive to differentiate between a slow response and a true zombie. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org