[ https://issues.apache.org/jira/browse/SOLR-12842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736137#comment-16736137 ]
Webster Homer commented on SOLR-12842: -------------------------------------- I see this issue a lot, it is a major maintenance headache. It seems that network issues can create corruption and when it does the cdcr replication for that collection is stopped until we delete a bad tlog. I guess we see this several times a month with multiple collections > CDCR stops replication and goes into infinite loop of retrying, if one of the > updates are corrupted. > ---------------------------------------------------------------------------------------------------- > > Key: SOLR-12842 > URL: https://issues.apache.org/jira/browse/SOLR-12842 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: CDCR > Affects Versions: master (9.0) > Reporter: Amrit Sarkar > Priority: Major > > Currently, CDCR reads updates from the transaction logs, create UpdateRequest > and forwards it to the target. If the UpdateRequest sent fails, the same > request is retried indefinitely until a successful acknowledgment is > received. > {code} > 2018-10-03 00:05:38.878 WARN (cdcr-replicator-35-thread-4) [ ] > o.a.s.h.CdcrReplicator Failed to forward update request to target: DEMO_DR > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://10.202.153.32:3987/solr/DEMO_DR: version conflict for > ec53e70d-72fb-42bd-827e-a3fc54e33bad expected=1613259487692455936 actual=-1 > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1106) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:886) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > org.apache.solr.handler.CdcrReplicator.sendRequest(CdcrReplicator.java:140) > ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:13] > at org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:120) > ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:13] > at > org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81) > ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:13] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - > jpountz - 2018-06-18 16:55:14] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0-zing_18.07.1.0] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0-zing_18.07.1.0] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0-zing_18.07.1.0] > {code} > This design is in place when connected to target is broken and you need to > halt the forwarding, but better error handling should be in place if a bad > payload or other external factors are causing the failures. Suggestions and > feedbacks are welcome. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org