[
https://issues.apache.org/jira/browse/MESOS-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Artem Harutyunyan updated MESOS-5576:
-------------------------------------
Sprint: Mesosphere Sprint 38
> Masters may drop the first message they send between masters after a network
> partition
> --------------------------------------------------------------------------------------
>
> Key: MESOS-5576
> URL: https://issues.apache.org/jira/browse/MESOS-5576
> Project: Mesos
> Issue Type: Improvement
> Components: leader election, master, replicated log
> Affects Versions: 0.28.2
> Environment: Observed in an OpenStack environment where each master
> lives on a separate VM.
> Reporter: Joseph Wu
> Assignee: Joseph Wu
> Labels: mesosphere
>
> We observed the following situation in a cluster of five masters:
> || Time || Master 1 || Master 2 || Master 3 || Master 4 || Master 5 ||
> | 0 | Follower | Follower | Follower | Follower | Leader |
> | 1 | Follower | Follower | Follower | Follower || Partitioned from cluster
> by downing this VM's network ||
> | 2 || Elected Leader by ZK | Voting | Voting | Voting | Suicides due to lost
> leadership |
> | 3 | Performs consensus | Replies to leader | Replies to leader | Replies to
> leader | Still down |
> | 4 | Performs writing | Acks to leader | Acks to leader | Acks to leader |
> Still down |
> | 5 | Leader | Follower | Follower | Follower | Still down |
> | 6 | Leader | Follower | Follower | Follower | Comes back up |
> | 7 | Leader | Follower | Follower | Follower | Follower |
> | 8 || Partitioned in the same way as Master 5 | Follower | Follower |
> Follower | Follower |
> | 9 | Suicides due to lost leadership || Elected Leader by ZK | Follower |
> Follower | Follower |
> | 10 | Still down | Performs consensus | Replies to leader | Replies to
> leader || Doesn't get the message! ||
> | 11 | Still down | Performs writing | Acks to leader | Acks to leader ||
> Acks to leader ||
> | 12 | Still down | Leader | Follower | Follower | Follower |
> Master 2 sends a series of messages to the recently-restarted Master 5. The
> first message is dropped, but subsequent messages are not dropped.
> This appears to be due to a stale link between the masters. Before leader
> election, the replicated log actors create a network watcher, which adds
> links to masters that join the ZK group:
> https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/network.hpp#L157-L159
> This link does not appear to break (Master 2 -> 5) when Master 5 goes down,
> perhaps due to how the network partition was induced (in the hypervisor
> layer, rather than in the VM itself).
> When Master 2 tries to send an {{PromiseRequest}} to Master 5, we do not
> observe the [expected log
> message|https://github.com/apache/mesos/blob/7a23d0da817be4e8f68d96f524cecf802431033c/src/log/replica.cpp#L493-L494]
> Instead, we see a log line in Master 2:
> {code}
> process.cpp:2040] Failed to shutdown socket with fd 27: Transport endpoint is
> not connected
> {code}
> The broken link is removed by the libprocess {{socket_manager}} and the
> following {{WriteRequest}} from Master 2 to Master 5 succeeds via a new
> socket.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)