[
https://issues.apache.org/jira/browse/SOLR-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gerlowski updated SOLR-13038:
-----------------------------------
Attachment: SOLR-13038.patch
> Overseer actions fail with NoHttpResponseException following a node restart
> ---------------------------------------------------------------------------
>
> Key: SOLR-13038
> URL: https://issues.apache.org/jira/browse/SOLR-13038
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Affects Versions: master (8.0)
> Reporter: Jason Gerlowski
> Assignee: Jason Gerlowski
> Priority: Major
> Attachments: SOLR-13038.patch
>
>
> I noticed recently that a lot of overseer operations fail if they're executed
> right after a restart of a Solr node. The failure returns a message like
> "org.apache.solr.client.solrj.SolrServerException:IOException occured when
> talking to server at: https://127.0.0.1:62253/solr". The logs are a bit more
> helpful:
> {code}
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: https://127.0.0.1:62253/solr
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
> ~[java/:?]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
> ~[java/:?]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
> ~[java/:?]
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260)
> ~[java/:?]
> at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
> ~[java/:?]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_172]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_172]
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
> ~[metrics-core-3.2.6.jar:3.2.6]
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> ~[java/:?]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_172]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_172]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
> Caused by: org.apache.http.NoHttpResponseException: 127.0.0.1:62253 failed to
> respond
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
> ~[httpcore-4.4.10.jar:4.4.10]
> at
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
> ~[httpcore-4.4.10.jar:4.4.10]
> at
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
> ~[httpcore-4.4.10.jar:4.4.10]
> at
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
> ~[httpcore-4.4.10.jar:4.4.10]
> at
> org.apache.solr.util.stats.InstrumentedHttpRequestExecutor.execute(InstrumentedHttpRequestExecutor.java:120)
> ~[java/:?]
> at
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
> ~[httpclient-4.5.6.jar:4.5.6]
> at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:542)
> ~[java/:?]
> ... 12 more
> {code}
> After a bit of debugging I was able to confirm the problem: when some
> non-overseer node gets restarted, the overseer never notices that its
> connections are invalid and will try to reuse them for subsequent requests
> that happen right after the restart.
> There's a few ways we might be able to tackle this:
> * we could look at adding logic to {{SolrHttpRequestRetryHandler}} to retry
> when this happens. SHRRH already retries NoHttpResponseException generally,
> but has other logic which prevents any retries on collection/core-admin APIs.
> Maybe we could elaborate this a bit.
> * we could add retry logic to the {{HttpShardHandler}} code that makes these
> requests. We could do this across the board, or more selectively for only
> the overseer commands that are "retry-able".
> * We could tweak how our connection pool is managed so that it evicts these
> idle connections more aggressively. It seems like something similar has
> already been tried (without success) on SOLR-6944
> Not sure what the right approach is. Seems like intermittent
> NoHttpResponseExceptions have been a problem in Solr (and its tests) going
> back at least 5 years or so. Several JIRAs suggested adding retries for NHRE
> in the past but have been killed since not all APIs are idempotent and other
> JIRAs have been concerned with fixing this at the (very broad) SolrClient
> level.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]