[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper

Shuai Lin (JIRA) Sat, 05 Dec 2015 07:28:04 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15043346#comment-15043346
 ]


Shuai Lin commented on MESOS-1806:
----------------------------------

There are two situations to handle:

* Etcd servers wound't accept requests from clients during the leader election 
phase. So when there is a leader re-election among the etcd servers, the 
request from the current master to renew the timestamp of the {{v2/keys/mesos}} 
node would fail, and the current code would immediately retry with the next 
server, which would refuse the request as well. Thus the master would exit due 
to all servers fail its requests. The same happens with slaves -- detector 
would fail after requests to all the etcd servers are refused. To solve this, 
we can add logic to wait for a while before trying the next server.

* If the the current master somehow fails to update the {{v2/keys/mesos}} node 
in time, that node would expire, the detector would detect this, commit suicide 
due to lost of leadership. This is correct behavior, but the current TTL is 
kind of small: only 5 seconds, and the current master is set to update the node 
at 80% of the TTL, i.e. 4 seconds, so the chance of this problem is not that 
low, e.g. if there happens ephemeral network problem. This can be achieved by 
increase the TTL to 10 seconds, or let the current master try to update the 
node at 60% of the TTL.

[~cmaloney] [~benjaminhindman] What do you think?

> Substituting etcd for Zookeeper
> -------------------------------
>
>                 Key: MESOS-1806
>                 URL: https://issues.apache.org/jira/browse/MESOS-1806
>             Project: Mesos
>          Issue Type: Task
>          Components: leader election
>            Reporter: Ed Ropple
>            Assignee: Shuai Lin
>            Priority: Minor
>
> <adam_mesos>   eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper

Reply via email to