[ 
https://issues.apache.org/jira/browse/MESOS-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914707#comment-13914707
 ] 

Bill Farner commented on MESOS-890:
-----------------------------------

Ideally the approach we come up with doesn't require others to have a ZooKeeper 
expert to replicate.

Has the inverse of #2 been considered?  i.e. have the slaves look for the 
master in both locations.  This requires more deploy steps, but AFAICT they are 
relatively easy to reason about.

> Figure out a way to migrate a live Mesos cluster to a different ZooKeeper 
> cluster
> ---------------------------------------------------------------------------------
>
>                 Key: MESOS-890
>                 URL: https://issues.apache.org/jira/browse/MESOS-890
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Raul Gutierrez Segales
>            Assignee: Raul Gutierrez Segales
>
> I've been chatting with [~vinodkone] about approaching a live ZK cluster 
> migration. Here are the options we came up with.
> For the descriptions we treat `zk1` as the current working cluster, `obs` as 
> a  bunch of ZooKeeper Observers [1] and `zk2` as the new cluster to which we 
> need to migrate. 
> Approach #1: Using Observers
> With this option we need to:
> * add obs to zk1
> * restart slaves to have them use obs to find their master
> * restart the framework having it use obs to find the mesos master
> * restart the mesos masters having them use obs to perform their election
> * we then stop all ZK obs and remove their data (since they will need to sync 
> up with an entirely new cluster, we need to lose the old data)
> * we restart ZK obs having them be part of zk2
> * at this point the slaves, the framework and the masters can reach the ZK 
> obs again and an election happens
> * optionally you can restart slaves, the framework and masters again using 
> zk2 instead of the ZK obs if you wanted to decommission them. 
> This assumes that we can do the last three steps in << 75 secs (75 secs being 
> the slave health check timeout). This is a reasonable assumption if the data 
> size in zk2 is small enough to ensure that the ZK obs can sync up quickly 
> with zk2. If zk2 is a new cluster with no data then this should be very fast.
> The good things of this approach are:
> * no mesos code change
> * it is very easy to rollback half way through, if need be
> The hard issues are:
> * Manipulating the ZK obs (i.e.: stopping, removing the data from zk1 and 
> starting again) needs to be done with care. Messing up configs or not 
> removing the data from zk1 on any of the ZK obs will cause problems
> * we need to restart all slaves to have them use the ZK obs instead of 
> connecting to zk1 directly. But with slave recovery this isn't an issue, just 
> an extra step.
> * same thing for the framework and the masters
> Approach #2: Dual publishing from mesos masters
> With this option we would augment the election handling code in mesos masters 
> to have it deal with the notion of a primary and secondary ZK clusters. 
> Master registration and election would then work as follows:
> * create an ephemeral|sequential znode in zk1 (i.e.:  
> /path/to/znode/mesos_000023)
> * create an ephemeral, but not sequential, znode in zk2 with the exact same 
> path as what was created in zk1 (i.e.: /path/to/znode/mesos_000023)
> * make sure both sessions, in zk1 and zk2, are always in the same state 
> (i.e.: if one expires, the other one should be closed, etc.)
> For now, lets omit a few implementation details which might need extra care 
> and assume we can make this work consistently in such a way that zk2 reflects 
> accurately elections that happen in zk1. This means that regardless of being 
> connected to zk1 or zk2, you always get the same master. Once we have this 
> the migration steps would be:
> * restart slaves to have them use zk2 where masters can be found by virtue of 
> what we implemented above
> * restart the framework so that it finds the mesos master in zk2
> * stop all mesos masters (they all need to be stopped before moving to the 
> next step)
> * start all mesos masters using zk2 as its primary and only cluster
> Again, this assumes we can do the last two steps in << 75 secs (or if we 
> needed to, we could bump the slave health check timeout). Which, again, 
> sounds achievable given that masters have no state and their start-up time is 
> very short.
> The good things of this approach are:
> - no tinkering with extra ZK servers nor with ZK configs 
> The hard issues are:
> - extra code needs to be added to the election handling bits of mesos master 
> to address a very rare, but probable, use-case of cluster migration. It might 
> take a bit of time to get that code right. 
> - it's easier to end up with a bad state if any of the mesos masters ends up 
> with a bad config or is restarted earlier and ends up publishing differently 
> than the other masters. This could lead to elections with differing results. 
> Thoughts?
> [1] http://zookeeper.apache.org/doc/trunk/zookeeperObservers.html



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to