I believe before IndexFetcher was creating HttpSolrClient using default
setting for timeouts, I will confirm it once.

DEFAULT_CONNECT_TIMEOUT = 60000;
DEFAULT_SO_TIMEOUT = 600000;

More Info here
:- solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpClientUtil.java

On Thu, Mar 20, 2025 at 5:56 AM David Smiley <dsmi...@apache.org> wrote:

> *Excellent* research Luke!
>
> Yes, please file a JIRA and propose a fix.  I also have noticed timeouts
> named one thing and applied in a different way.  Very suspicious.
>
> Hey Gus; if memory serves, did you discuss that once?
>
> On Wed, Mar 19, 2025 at 6:28 PM Luke Kot-Zaniewski (BLOOMBERG/ 919 3RD A) <
> lkotzanie...@bloomberg.net> wrote:
>
> > Greetings,
> >
> > During our rollout of 9.8 we discovered an interesting
> > behavior indirectly caused by the Http2SolrClient migration
> > of IndexFetcher:
> >
> >
> >
> https://github.com/apache/solr/commit/25194b02caa383feda293490eed6ccbd7c3ecf05#diff-7af383a173bd8e05414b341ab08e9ca715b665077112c64150c4db00811d16a6
> >
> > The change itself does not appear to be the problem, but
> > rather the default behavior of Http2SolrClient applying
> > the *idle* timeout to the overall request time:
> >
> >
> >
> https://github.com/apache/solr/blob/2b8f933529fa736fe5fd2a9b0c751bedf352f0c7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L625-L629
> >
> > Apparently this choice of default has some history:
> >
> >
> >
> https://github.com/apache/solr/commit/a80eb84d5659a06a10860ad2470e87d80b19fa5d
> > +
> > in its current form:
> >
> >
> https://github.com/apache/solr/commit/d70af456058174d15a25d3c9b8cc5f7a8899b62b
> >
> > At any rate, in most cases this goes unnoticed because the
> > default idle timeout is quite long (120 seconds) but can
> > cause problems when applied to something like IndexFetcher
> > which is probably *expected* to have sometimes really long-lived,
> > healthy connections exceeding the 120s period. An *idle*
> > timeout being applied to a long-lived, non-idle connection
> > doesn't seem quite right...
> >
> > We saw this during replication of a 5GB segment which, at
> > our bandwidth at the time, exceeded the 120 second time window
> > and caused the Cloud to get stuck in a replication loop.
> >
> > The stacktrace tells the full story, namely in 120 seconds
> > we were only able to copy 4.5GB (implied 300 Mbps) and were
> > interrupted by this arguably misapplied idle timeout (on a
> > clearly non-idle connection!)
> >
> > 2025-03-17 20:47:15.144 WARN  IndexFetcher [XXXX] ? [XXXX] - Error in
> > fetching file: _7rxzv.cfs (downloaded 457179136 of 5529080446 bytes)
> > java.io.IOException: java.util.concurrent.TimeoutException: Total timeout
> > 120000 ms elapsed
> >      at
> >
> org.eclipse.jetty.client.util.InputStreamResponseListener$Input.toIOException(InputStreamResponseListener.java:343)
> > ~[jetty-client-10.0.22.jar:10.0.22]
> >        at
> >
> org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:311)
> > ~[jetty-client-10.0.22.jar:10.0.22]
> >         at java.base/java.io
> .FilterInputStream.read(FilterInputStream.java:133)
> > ~[?:?]
> >    at
> >
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:87)
> > ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user
> -
> > 2025-01-29 16:21:49]
> >         at
> >
> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:138)
> > ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user
> -
> > 2025-01-29 16:21:49]
> >     at
> >
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:176)
> > ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user
> -
> > 2025-01-29 16:21:49]
> >        at
> >
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1849)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >   at
> >
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1790)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >  at
> >
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1772)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >      at
> >
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1192)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >         at
> >
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:679)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >    at
> >
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:410)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >    at
> >
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:522)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >         at
> >
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:243)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >     at
> >
> org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:697)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >     at
> >
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >    at
> > org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309)
> > ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user -
> > 2025-01-29 16:21:49]
> >   at
> >
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212)
> > ~[metrics-core-4.2.26.jar:4.2.26]
> >      at
> >
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> > ~[?:?]
> >       at
> > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
> >      at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:380)
> > ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user
> -
> > 2025-01-29 16:21:49]
> >    at
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > ~[?:?]
> >       at
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > ~[?:?]
> >       at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
> > Caused by: java.util.concurrent.TimeoutException: Total timeout 120000 ms
> > elapsed
> >  at
> >
> org.eclipse.jetty.client.HttpConnection$RequestTimeouts.onExpired(HttpConnection.java:342)
> > ~[jetty-client-10.0.22.jar:10.0.22]
> >         at
> >
> org.eclipse.jetty.client.HttpConnection$RequestTimeouts.onExpired(HttpConnection.java:322)
> > ~[jetty-client-10.0.22.jar:10.0.22]
> >         at org.eclipse.jetty.io
> .CyclicTimeouts.onTimeoutExpired(CyclicTimeouts.java:110)
> > ~[jetty-io-10.0.22.jar:10.0.22]
> >  at org.eclipse.jetty.io
> .CyclicTimeouts$Timeouts.onTimeoutExpired(CyclicTimeouts.java:197)
> > ~[jetty-io-10.0.22.jar:10.0.22]
> >         at org.eclipse.jetty.io
> .CyclicTimeout$Wakeup.run(CyclicTimeout.java:294)
> > ~[jetty-io-10.0.22.jar:10.0.22]
> >  at
> >
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> > ~[?:?]
> >       at
> > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
> >      at
> >
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> > ~[?:?]
> >
> > If the community agrees this is a bug I'd be happy to raise a JIRA
> > ticket and attempt a fix, which I think should be simple.
> >
> > What do you all think?
> >
> > Thanks,
> > Luke
>

Reply via email to