[
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ishan Chattopadhyaya updated SOLR-7569:
---------------------------------------
Attachment: SOLR-7569.patch
Based on an offline conversation with Shalin (and the discussion above), I've
removed that extra handling of the situation where:
# there is no LIR involved
# all replicas are down
# there is no leader.
This involved force marking the replica at the election queue head as a leader,
which might have other unintended consequences. Hopefully, this situation never
occurs in the real world. If it does, then we can tackle this in a separate
issue.
The following situation is still taken care of:
# there is no LIR involved
# all replicas are down
[~shalinmangar] please review the changes. Thanks.
> Create an API to force a leader election between nodes
> ------------------------------------------------------
>
> Key: SOLR-7569
> URL: https://issues.apache.org/jira/browse/SOLR-7569
> Project: Solr
> Issue Type: New Feature
> Components: SolrCloud
> Reporter: Shalin Shekhar Mangar
> Assignee: Shalin Shekhar Mangar
> Labels: difficulty-medium, impact-high
> Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch,
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch,
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch,
> SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all
> replicas' last published state was recovery or due to bugs which cause a
> leader to be marked as 'down'. While the best solution is that they never get
> into this state, we need a manual way to fix this when it does get into this
> state. Right now we can do a series of dance involving bouncing the node
> (since recovery paths between bouncing and REQUESTRECOVERY are different),
> but that is difficult when running a large cluster. Although it is possible
> that such a manual API may lead to some data loss but in some cases, it is
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force
> replicas into recovering a leader while avoiding data loss on a best effort
> basis.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]