[ https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ishan Chattopadhyaya updated SOLR-7569: --------------------------------------- Attachment: SOLR-7569.patch bq. This particular collection admin operation does not really have to go to overseer, it can be performed by the receiving node itself because the clearing of LIR node does not have to be done at overseer anyway Here is a patch that adds the API command (FORCELEADER) to the CollectionsHandler instead of the OCMH. I couldn't find a way to do this ASYNC, which I could do it at OCMH, did I miss something? Does this look fine? ([~noble.paul] ?) I somehow feel doing it in CollectionsHandler is a bit misplaced, and would rather do it at OCMH. But I am fine either ways so long as we do it; both patches are there. Note: As with the previous patch (that puts the meat into the OCMH), this patch depends on prior application of patches in SOLR-8233 and SOLR-7989. > Create an API to force a leader election between nodes > ------------------------------------------------------ > > Key: SOLR-7569 > URL: https://issues.apache.org/jira/browse/SOLR-7569 > Project: Solr > Issue Type: New Feature > Components: SolrCloud > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Labels: difficulty-medium, impact-high > Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, > SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, > SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, > SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, > SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch > > > There are many reasons why Solr will not elect a leader for a shard e.g. all > replicas' last published state was recovery or due to bugs which cause a > leader to be marked as 'down'. While the best solution is that they never get > into this state, we need a manual way to fix this when it does get into this > state. Right now we can do a series of dance involving bouncing the node > (since recovery paths between bouncing and REQUESTRECOVERY are different), > but that is difficult when running a large cluster. Although it is possible > that such a manual API may lead to some data loss but in some cases, it is > the only possible option to restore availability. > This issue proposes to build a new collection API which can be used to force > replicas into recovering a leader while avoiding data loss on a best effort > basis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org