[ https://issues.apache.org/jira/browse/SOLR-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Varun Thacker updated SOLR-11881: --------------------------------- Attachment: SOLR-11881.patch > Connection Reset Causing LIR > ---------------------------- > > Key: SOLR-11881 > URL: https://issues.apache.org/jira/browse/SOLR-11881 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Varun Thacker > Priority: Major > Attachments: SOLR-11881.patch > > > We can see that a connection reset is causing LIR. > If a leader -> replica update get's a connection like this the leader will > initiate LIR > {code} > 2018-01-08 17:39:16.980 ERROR (qtp1010030701-468988) [c:person s:shard7_1 > r:core_node56 x:person_shard7_1_replica1] > o.a.s.u.p.DistributedUpdateProcessor Setting up to try to start recovery on > replica https://host08.domain:8985/solr/person_shard7_1_replica2/ > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:210) > at java.net.SocketInputStream.read(SocketInputStream.java:141) > at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) > at sun.security.ssl.InputRecord.read(InputRecord.java:503) > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) > at > org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:543) > at > org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:409) > at > org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177) > at > org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:304) > at > org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446) > at > org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55) > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:312) > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > From https://issues.apache.org/jira/browse/SOLR-6931 Mark says "On a heavy > working SolrCloud cluster, even a rare response like this from a replica can > cause a recovery and heavy cluster disruption" . > Looking at SOLR-6931 we added a http retry handler but we only retry on GET > requests. Updates are POST requests > {{ConcurrentUpdateSolrClient#sendUpdateStream}} > Update requests between the leader and replica should be retry-able since > they have been versioned. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org