Noble Paul created SOLR-9226: -------------------------------- Summary: Automatically fire FORCELEADER if shard leader is missing Key: SOLR-9226 URL: https://issues.apache.org/jira/browse/SOLR-9226 Project: Solr Issue Type: Bug Reporter: Noble Paul Assignee: Noble Paul
We have seen the shards losing leader often. {code} x:lamp_2016050713_shard2_replica1] o.a.s.c.ZkController Error getting leader from zk org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1044) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1011) at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:967) at org.apache.solr.cloud.ZkController.register(ZkController.java:906) at org.apache.solr.cloud.ZkController.register(ZkController.java:849) at org.apache.solr.core.ZkContainer$2.run(ZkContainer.java:183) {code} There could be other instances as well I recommend the following to heal such clusters * Whenever a node finds that the shard has no LEADER, it should fire the force FORCELEADER command * FORCELEADER command is executed in the node that receives the command. It should be moved to overseer to ensure that we don't run multiple such commands in parallel. * The command should make the best effort to identify a leader and should assign a leader if at least one node is live in the shard * When a shard has lost the leader, it is very likely that thousands of such requests will be fired and they would clog the work queue. This command should ensure that duplicate requests for FORCELEADER are consumed up from the work-queue -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org