[ 
https://issues.apache.org/jira/browse/SOLR-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339833#comment-15339833
 ] 

Ishan Chattopadhyaya commented on SOLR-9226:
--------------------------------------------

Automatic triggering of FORCELEADER could lead to silent data loss. I suggest 
that we add a configuration knob to turn on or off the automatic triggering of 
FORCELEADER so that users who want high availability can turn on automatic 
FORCELEADER, and those who favour consistency above all else can turn it off.

> Automatically fire FORCELEADER if shard leader is missing
> ---------------------------------------------------------
>
>                 Key: SOLR-9226
>                 URL: https://issues.apache.org/jira/browse/SOLR-9226
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>
> We have seen the shards losing leader often. 
> {code}
> x:lamp_2016050713_shard2_replica1] o.a.s.c.ZkController Error getting leader 
> from zk
> org.apache.solr.common.SolrException: Could not get leader props
>         at 
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1044)
>         at 
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1011)
>         at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:967)
>         at org.apache.solr.cloud.ZkController.register(ZkController.java:906)
>         at org.apache.solr.cloud.ZkController.register(ZkController.java:849)
>         at org.apache.solr.core.ZkContainer$2.run(ZkContainer.java:183)
> {code}
> There could be other instances as well
> I recommend the following to heal such clusters 
> * Whenever a node finds that the shard has no LEADER, it should fire the 
> force FORCELEADER command
> * FORCELEADER command is executed in the node that receives the command. It 
> should be moved to overseer to ensure that we don't run multiple such 
> commands in parallel. 
> * The command should make the best effort to identify a leader and should 
> assign a leader if at least one node is live in the shard
> * When a shard has lost the leader, it is very likely that thousands of such 
> requests will be fired and they would clog the work queue. This command 
> should ensure that duplicate requests for FORCELEADER are consumed up from 
> the work-queue 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to