Varun Thacker created SOLR-11881: ------------------------------------ Summary: Connection Reset Causing LIR Key: SOLR-11881 URL: https://issues.apache.org/jira/browse/SOLR-11881 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Varun Thacker
We can see that a connection reset is causing LIR. If a leader -> replica update get's a connection like this the leader will initiate LIR {code} 2018-01-08 17:39:16.980 ERROR (qtp1010030701-468988) [c:person s:shard7_1 r:core_node56 x:person_shard7_1_replica1] o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on replica https://host08.domain:8985/solr/person_shard7_1_replica2/ java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:210) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543) at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177) at org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:312) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} >From https://issues.apache.org/jira/browse/SOLR-6931 Mark says "On a heavy >working SolrCloud cluster, even a rare response like this from a replica can >cause a recovery and heavy cluster disruption" . Looking at SOLR-6931 we added a http retry handler but we only retry on GET requests. Updates are POST requests {{ConcurrentUpdateSolrClient#sendUpdateStream}} Update requests between the leader and replica should be retry-able since they have been versioned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org