[
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000753#comment-15000753
]
Mark Miller commented on SOLR-7569:
-----------------------------------
A better approach is probably for this API to deal with a DOWN but valid leader
itself. It should only ever happen due to manually screwing up LIR and if this
API is messing with LIR, it should also fix the ramifications.
Perhaps the last thing the API should do is run through each shard and see if
the registered leader is DOWN, and if it is make it ACTIVE (preferably by
asking it to publish itself as ACTIVE - we don't want to publish for someone
else). If the call waits around to make sure all the leaders come up, this
should be simple.
> Create an API to force a leader election between nodes
> ------------------------------------------------------
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
> Issue Type: New Feature
> Components: SolrCloud
> Reporter: Shalin Shekhar Mangar
> Assignee: Noble Paul
> Labels: difficulty-medium, impact-high
> Fix For: 5.4, Trunk
>
> Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch,
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch,
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch,
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch,
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch,
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all
> replicas' last published state was recovery or due to bugs which cause a
> leader to be marked as 'down'. While the best solution is that they never get
> into this state, we need a manual way to fix this when it does get into this
> state. Right now we can do a series of dance involving bouncing the node
> (since recovery paths between bouncing and REQUESTRECOVERY are different),
> but that is difficult when running a large cluster. Although it is possible
> that such a manual API may lead to some data loss but in some cases, it is
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force
> replicas into recovering a leader while avoiding data loss on a best effort
> basis.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]