James Dyer created SOLR-17234:
---------------------------------
Summary: LBHttp2SolrClient does not skip "zombie" endpoints
Key: SOLR-17234
URL: https://issues.apache.org/jira/browse/SOLR-17234
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: SolrJ
Affects Versions: main (10.0)
Reporter: James Dyer
While working on SOLR-14763, I found different behavior with
*LBHttp2SolrClient* between *branch_9x* and {*}main/10.x{*}.
If the first Endpoint in the list had previously failed, *branch_9x* will skip
the failed Endpoint with subsequent requests, and begin requesting with the
second Endpoint. If all remaining Endpoints fail, it will then retry the first
Endpoint again.
If the first Endpoint in the list had previously failed, *main/10.x* will
always try the first Endpoint despite it being in the "Zombie List". When the
first Endpoint fails again, it will re-try the second Endpoint.
The *branch_9x* behavior seems more desirable as this minimizes unnecessary
work by avoiding Endpoints that are known to fail. Indeed, *main/10.x* has an
obvious bug in *EndpointIterator#fetchNext* where it attempts to get the wrong
type of key for the map holding the Zombies. I believe this difference is a
regression bug in *main/10x*.
The different behavior is recorded in test
*LBHttp2SolrClientTest#testAsyncWithFailures*. This test was added
after-the-fact with SOLR-14763. I needed to change its "asserts" when
backporting to *branch_9x* to account for the changed behavior.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]