[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740684#comment-14740684
 ] 

Timothy Potter commented on SOLR-7569:
--------------------------------------

Looks good Ishan. Sorry for the delay getting a review done. In 
putNonLeadersIntoLIR, you probably want to wait a little bit before killing the 
leader after sending doc #2 to give the leader time to put the replicas into 
LIR; this works quickly on our local workstations but can take a little more 
time on Jenkins.

I'm also wondering if you should bring the original downed leader back into the 
mix (the one that got killed in the putNonLeadersIntoLIR method) in the 
testReplicasInLIRNoLeader test after the new leader is selected and see what 
state it comes back to. Also, try sending another doc #5 once the Jetty hosting 
the original leader is back online.

Lastly, what happened to the idea of allowing the user to pick the leader as 
part of the recover shard request? I read the comments above and agree that 
just triggering a re-election is preferred, but sometimes us humans actually 
know which replica is best. It seems reasonable to me to accept an optional 
parameter that specifies the replica that should be selected. However, if 
others don't like that idea, then I'm fine with this for now.

> Create an API to force a leader election between nodes
> ------------------------------------------------------
>
>                 Key: SOLR-7569
>                 URL: https://issues.apache.org/jira/browse/SOLR-7569
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>              Labels: difficulty-medium, impact-high
>         Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to