[jira] [Commented] (SOLR-5991) SolrCloud: Add API to move leader off a Solr instance

Hoss Man (JIRA) Thu, 17 Apr 2014 15:56:21 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973533#comment-13973533
 ]


Hoss Man commented on SOLR-5991:
--------------------------------

Off the cuff: it sounds like, what you'd really want for these types of 
usecases, is:

1) an "AVOID_RESPONSIBILITY" role which tells a node it should never 
participate in elections -- either for shard leader, or for overseer.
2) per-node status info (from /admin/system) about whether this node is the 
overseer (SOLR-5823) and/or hosts the leader of any shard 
3) a "forceelection" Collection API action (that takes an optional collection 
name and shard name - so it can force overseer election, or leader election of 
all shards, or leader election of a specific shard)
4) logic in CoreContainer.shutdown() that causes the node to do the following 
before finishing a clean shutdown:
* act as if it has the AVOID_RESPONSIBILITY role (w/o updating it's actual zk 
state) until completion of shutdown
* loop over it's current responsibilities and self-trigger the necessary 
"forceelection" commands to elect someone else to take it's place sa 
overseer/shard-leader(s)

So...

* if you just want to reboot one node - you reboot that node, and instead of 
just acting like it's droped off the face of the earth and potentially 
triggering elections when the ZK epheeral nodes vanish, it poactively 
encourages an election first.
* If you want to shut down N machines permanently: you assign all of those N 
machines the role "AVOID_RESPONSIBILITY" in advance, and then iterate over them 
shutting them down.  Ones that had no responsibilities to begin with will 
shutdown fast, nodes that did have responsibilities will shutdown slower as 
they force elections - but none of the other machines you are about to shutdown 
will take on those responsibilities.
* If you want to reboot N machines with minimal down time: you can iterate over 
your N machines checking their /admin/system response to see if they are the 
overseer or a shard leader -- if they are, you trigger the neccessary 
action=forceelection commands and wait for them to complete.  when you are 
done, you should be able to shutdown/restart all N nodes very quickly, and then 
remove the "AVOID_RESPONSIBILITY" role at your lesuire.


> SolrCloud: Add API to move leader off a Solr instance
> -----------------------------------------------------
>
>                 Key: SOLR-5991
>                 URL: https://issues.apache.org/jira/browse/SOLR-5991
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.7.1
>            Reporter: Rich Mayfield
>
> Common maintenance chores require restarting Solr instances.
> The process of a shutdown becomes a whole lot more reliable if we can 
> proactively move any leadership roles off of the Solr instance we are going 
> to shut down. The leadership election process then runs immediately.
> I am not sure what the semantics should be (either accomplishes the goal but 
> one of these might be best):
> * A call to tell a core to give up leadership (thus the next replica is 
> chosen)
> * A call to specify which core should become the leader



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-5991) SolrCloud: Add API to move leader off a Solr instance

Reply via email to