Greetings, During our rollout of 9.8 we discovered an interesting behavior indirectly caused by the Http2SolrClient migration of IndexFetcher:
https://github.com/apache/solr/commit/25194b02caa383feda293490eed6ccbd7c3ecf05#diff-7af383a173bd8e05414b341ab08e9ca715b665077112c64150c4db00811d16a6 The change itself does not appear to be the problem, but rather the default behavior of Http2SolrClient applying the *idle* timeout to the overall request time: https://github.com/apache/solr/blob/2b8f933529fa736fe5fd2a9b0c751bedf352f0c7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L625-L629 Apparently this choice of default has some history: https://github.com/apache/solr/commit/a80eb84d5659a06a10860ad2470e87d80b19fa5d + in its current form: https://github.com/apache/solr/commit/d70af456058174d15a25d3c9b8cc5f7a8899b62b At any rate, in most cases this goes unnoticed because the default idle timeout is quite long (120 seconds) but can cause problems when applied to something like IndexFetcher which is probably *expected* to have sometimes really long-lived, healthy connections exceeding the 120s period. An *idle* timeout being applied to a long-lived, non-idle connection doesn't seem quite right... We saw this during replication of a 5GB segment which, at our bandwidth at the time, exceeded the 120 second time window and caused the Cloud to get stuck in a replication loop. The stacktrace tells the full story, namely in 120 seconds we were only able to copy 4.5GB (implied 300 Mbps) and were interrupted by this arguably misapplied idle timeout (on a clearly non-idle connection!) 2025-03-17 20:47:15.144 WARN IndexFetcher [XXXX] ? [XXXX] - Error in fetching file: _7rxzv.cfs (downloaded 457179136 of 5529080446 bytes) java.io.IOException: java.util.concurrent.TimeoutException: Total timeout 120000 ms elapsed at org.eclipse.jetty.client.util.InputStreamResponseListener$Input.toIOException(InputStreamResponseListener.java:343) ~[jetty-client-10.0.22.jar:10.0.22] at org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:311) ~[jetty-client-10.0.22.jar:10.0.22] at java.base/java.io.FilterInputStream.read(FilterInputStream.java:133) ~[?:?] at org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:87) ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:138) ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:176) ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1849) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1790) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1772) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1192) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:679) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:410) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:522) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:243) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:697) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212) ~[metrics-core-4.2.26.jar:4.2.26] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:380) ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 2025-01-29 16:21:49] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.base/java.lang.Thread.run(Thread.java:829) [?:?] Caused by: java.util.concurrent.TimeoutException: Total timeout 120000 ms elapsed at org.eclipse.jetty.client.HttpConnection$RequestTimeouts.onExpired(HttpConnection.java:342) ~[jetty-client-10.0.22.jar:10.0.22] at org.eclipse.jetty.client.HttpConnection$RequestTimeouts.onExpired(HttpConnection.java:322) ~[jetty-client-10.0.22.jar:10.0.22] at org.eclipse.jetty.io.CyclicTimeouts.onTimeoutExpired(CyclicTimeouts.java:110) ~[jetty-io-10.0.22.jar:10.0.22] at org.eclipse.jetty.io.CyclicTimeouts$Timeouts.onTimeoutExpired(CyclicTimeouts.java:197) ~[jetty-io-10.0.22.jar:10.0.22] at org.eclipse.jetty.io.CyclicTimeout$Wakeup.run(CyclicTimeout.java:294) ~[jetty-io-10.0.22.jar:10.0.22] at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?] If the community agrees this is a bug I'd be happy to raise a JIRA ticket and attempt a fix, which I think should be simple. What do you all think? Thanks, Luke