I am curious if the zookeeper have the same behavior and issue. do we can
setup a metrics to compare the issue with etcd vs zookeeper. it will driver
us to define the correct scope.

2016-01-20 14:42 GMT+08:00 Shuai Lin <[email protected]>:

> Hi Benjamin and all,
>
> I'd like to talk about MESOS-1806. Since I took this ticket from halfway,
> and there was no design doc for it, I have created one based on the current
> implementation.
>
>
> https://docs.google.com/document/d/1ccY0XJoOODpIiGPllSVvl7t-YRrIEE_NavfbZHKPWBs/edit?usp=sharing
>
> Besides, there some details I'd like to discuss:
>
>
> 1. Etcd servers wound't accept requests from clients during the leader
> election phase. So when there is a leader re-election among the etcd
> servers, the request from the current master to renew the timestamp of the
> v2/keys/mesos node would fail, and the current code would immediately retry
> with the next server, which would refuse the request as well. Thus the
> master would exit due to all servers fail its requests. The same happens
> with slaves – detector would fail after requests to all the etcd servers
> are refused. To solve this, we should add logic to wait for a while before
> trying the next server.
>
> 2. If the the current master somehow fails to update the v2/keys/mesos node
> in time, that node would expire, the detector would detect this, commit
> suicide due to lost of leadership. This is correct behavior, but the
> current TTL is kind of small: only 5 seconds, and the current master is set
> to update the node at 80% of the TTL, i.e. the 4th second, so the chance of
> this problem is not that low, e.g. if there happens ephemeral network
> problem. This can be achieved by increase the TTL to 10 seconds, and let
> the current master try to update the etcd node at 60% of the TTL.
>
> 3. The current implementation requires the list of masters to be specified
> in the "--masters=..." flag (used in the replicated logs quorum), this
> makes it inconvenient to add new masters to the cluster: every existing
> master must be restarted with updated "--masters=" flag. What about create
> a directory in etcd key space, and let each master create a child node in
> that directory?
>
>
> Regards,
> Shuai
>



-- 
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com

Reply via email to