Hi all - problem with a SolrCloud 5.5.0, we have a node that has most of the collections on it marked as "Recovering" or "Recovery Failed". It attempts to recover from the leader, but the leader responds with:
Error while trying to recover. core=iris_shard1_replica1:java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://172.31.1.171:30000/solr: We are not the leader at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:596) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:353) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://172.31.1.171:30000/solr: We are not the leader at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:576) at org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:284) at org.apache.solr.client.solrj.impl.HttpSolrClient$1.call(HttpSolrClient.java:280) ... 5 more and recovery never occurs. Each collection in this state has plenty (10+) of active replicas, but stopping the server that is marked as the leader doesn't trigger a leader election amongst these replicas. REBALANCELEADERS did nothing. FORCELEADER complains that there is already a leader. FORCELEADER with the purported leader stopped took 45 seconds, reported status of "0" (and no other message) and kept the down node as the leader (!) Deleting the failed collection from the failed node and re-adding it has the same "Leader said I'm not the leader" error message. Any other ideas? Cheers Tom