[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-7569:
---------------------------------------
    Attachment: SOLR-7569.patch

bq. This should just get you out of the state that is preventing a leader from 
being elected.
Updating the patch that attempts to do just that. :-)
* Clear LIR znodes
* Mark all nodes as active
* Wait for leader election so that normal state is restored 

(I also tried to mark just one of the replicas as active instead of all the 
replicas, hoping it will become leader and others would recover from it. 
However, this resulted in one of the other down replicas becoming leader but 
still staying down. Looking into why that could be happening; bug?)

> Create an API to force a leader election between nodes
> ------------------------------------------------------
>
>                 Key: SOLR-7569
>                 URL: https://issues.apache.org/jira/browse/SOLR-7569
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>              Labels: difficulty-medium, impact-high
>         Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to