Yea but was the HttpSolrClient applying the default *idle* timeout (socket timeout) as the total request timeout? These are different things, especially for requests where the response is streamed slowly over long periods of time (so not idle .. just slow). I couldn't find this in the old HttpSolrClient and also have never seen this happening before.
Sent from Bloomberg Professional for Android ----- Original Message ----- From: Sanjay Dutt <dev@solr.apache.org> To: dev@solr.apache.org CC: gus.h...@gmail.com At: 03/19/25 21:57:27 UTC-04:00 I believe before IndexFetcher was creating HttpSolrClient using default setting for timeouts, I will confirm it once. DEFAULT_CONNECT_TIMEOUT = 60000; DEFAULT_SO_TIMEOUT = 600000; More Info here :- solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClientUtil.java On Thu, Mar 20, 2025 at 5:56 AM David Smiley <dsmi...@apache.org> wrote: > *Excellent* research Luke! > > Yes, please file a JIRA and propose a fix. I also have noticed timeouts > named one thing and applied in a different way. Very suspicious. > > Hey Gus; if memory serves, did you discuss that once? > > On Wed, Mar 19, 2025 at 6:28 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) < > lkotzanie...@bloomberg.net> wrote: > > > Greetings, > > > > During our rollout of 9.8 we discovered an interesting > > behavior indirectly caused by the Http2SolrClient migration > > of IndexFetcher: > > > > > > > https://github.com/apache/solr/commit/25194b02caa383feda293490eed6ccbd7c3ecf05#diff-7af383a173bd8e05414b341ab08e9ca715b665077112c64150c4db00811d16a6 > > > > The change itself does not appear to be the problem, but > > rather the default behavior of Http2SolrClient applying > > the *idle* timeout to the overall request time: > > > > > > > https://github.com/apache/solr/blob/2b8f933529fa736fe5fd2a9b0c751bedf352f0c7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L625-L629 > > > > Apparently this choice of default has some history: > > > > > > > https://github.com/apache/solr/commit/a80eb84d5659a06a10860ad2470e87d80b19fa5d > > + > > in its current form: > > > > > https://github.com/apache/solr/commit/d70af456058174d15a25d3c9b8cc5f7a8899b62b > > > > At any rate, in most cases this goes unnoticed because the > > default idle timeout is quite long (120 seconds) but can > > cause problems when applied to something like IndexFetcher > > which is probably *expected* to have sometimes really long-lived, > > healthy connections exceeding the 120s period. An *idle* > > timeout being applied to a long-lived, non-idle connection > > doesn't seem quite right... > > > > We saw this during replication of a 5GB segment which, at > > our bandwidth at the time, exceeded the 120 second time window > > and caused the Cloud to get stuck in a replication loop. > > > > The stacktrace tells the full story, namely in 120 seconds > > we were only able to copy 4.5GB (implied 300 Mbps) and were > > interrupted by this arguably misapplied idle timeout (on a > > clearly non-idle connection!) > > > > 2025-03-17 20:47:15.144 WARN IndexFetcher [XXXX] ? [XXXX] - Error in > > fetching file: _7rxzv.cfs (downloaded 457179136 of 5529080446 bytes) > > java.io.IOException: java.util.concurrent.TimeoutException: Total timeout > > 120000 ms elapsed > > at > > > org.eclipse.jetty.client.util.InputStreamResponseListener$Input.toIOException(InputStreamResponseListener.java:343) > > ~[jetty-client-10.0.22.jar:10.0.22] > > at > > > org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:311) > > ~[jetty-client-10.0.22.jar:10.0.22] > > at java.base/java.io > .FilterInputStream.read(FilterInputStream.java:133) > > ~[?:?] > > at > > > org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:87) > > ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user > - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:138) > > ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user > - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:176) > > ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user > - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1849) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1790) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1772) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1192) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:679) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:410) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:522) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:243) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:697) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) > > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - > > 2025-01-29 16:21:49] > > at > > > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212) > > ~[metrics-core-4.2.26.jar:4.2.26] > > at > > > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > > ~[?:?] > > at > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > > at > > > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:380) > > ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user > - > > 2025-01-29 16:21:49] > > at > > > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > > ~[?:?] > > at > > > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > > ~[?:?] > > at java.base/java.lang.Thread.run(Thread.java:829) [?:?] > > Caused by: java.util.concurrent.TimeoutException: Total timeout 120000 ms > > elapsed > > at > > > org.eclipse.jetty.client.HttpConnection$RequestTimeouts.onExpired(HttpConnection.java:342) > > ~[jetty-client-10.0.22.jar:10.0.22] > > at > > > org.eclipse.jetty.client.HttpConnection$RequestTimeouts.onExpired(HttpConnection.java:322) > > ~[jetty-client-10.0.22.jar:10.0.22] > > at org.eclipse.jetty.io > .CyclicTimeouts.onTimeoutExpired(CyclicTimeouts.java:110) > > ~[jetty-io-10.0.22.jar:10.0.22] > > at org.eclipse.jetty.io > .CyclicTimeouts$Timeouts.onTimeoutExpired(CyclicTimeouts.java:197) > > ~[jetty-io-10.0.22.jar:10.0.22] > > at org.eclipse.jetty.io > .CyclicTimeout$Wakeup.run(CyclicTimeout.java:294) > > ~[jetty-io-10.0.22.jar:10.0.22] > > at > > > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > > ~[?:?] > > at > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > > at > > > java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) > > ~[?:?] > > > > If the community agrees this is a bug I'd be happy to raise a JIRA > > ticket and attempt a fix, which I think should be simple. > > > > What do you all think? > > > > Thanks, > > Luke >