[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

Mark Miller (JIRA) Wed, 11 Nov 2015 09:38:47 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000753#comment-15000753
 ]


Mark Miller commented on SOLR-7569:
-----------------------------------

A better approach is probably for this API to deal with a DOWN but valid leader 
itself. It should only ever happen due to manually screwing up LIR and if this 
API is messing with LIR, it should also fix the ramifications.

Perhaps the last thing the API should do is run through each shard and see if 
the registered leader is DOWN, and if it is make it ACTIVE (preferably by 
asking it to publish itself as ACTIVE - we don't want to publish for someone 
else). If the call waits around to make sure all the leaders come up, this 
should be simple.

> Create an API to force a leader election between nodes
> ------------------------------------------------------
>
>                 Key: SOLR-7569
>                 URL: https://issues.apache.org/jira/browse/SOLR-7569
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Noble Paul
>              Labels: difficulty-medium, impact-high
>             Fix For: 5.4, Trunk
>
>         Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7569) Create an API to force a leader election between nodes

Reply via email to