Hi,

I think I found a bug in the LBHttp2SolrClient. The exception that's
triggered when the idle timeout is exceeded is a TimeoutException. It's wrapped
in a SolrServerException
<https://github.com/apache/solr/blob/6a9e33b5c61edd64aef85508718a0d6319677b32/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L544-L549>
and while the wrapped exception itself is an IOException with a cause of
TimeoutException, the handling here
<https://github.com/apache/solr/blob/6a9e33b5c61edd64aef85508718a0d6319677b32/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBHttp2SolrClient.java#L284-L290>
uses
getRootCause()
<https://github.com/apache/solr/blob/6a9e33b5c61edd64aef85508718a0d6319677b32/solr/solrj/src/java/org/apache/solr/client/solrj/SolrServerException.java#L40C20-L40C32>
which
unwraps the IOException and pulls out the inner TimeoutException which then
fails the check against IOException and doesn't mark the server as a
zombie, so unresponsive servers still receive traffic. I believe the
expected behavior is that an unresponsive server should be marked as a
zombie.

This is fairly straightforward to setup a test to reproduce as well as fix
the issue. Happy to send a PR to fix this.

Reply via email to