[ https://issues.apache.org/jira/browse/MESOS-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821711#comment-13821711 ]
Benjamin Mahler commented on MESOS-422: --------------------------------------- [~wfarner] I prefer that option since it requires no change to what we store in ZK, we do have this notion in the code (The Group::Membership of the temporary new leader will not match the leading Group::Membership in ZK), but this unfortunately gets lost through the current layers of abstraction since the detection and contending processes were made separate). > Master leader election should be more robust to stale ephemeral nodes > --------------------------------------------------------------------- > > Key: MESOS-422 > URL: https://issues.apache.org/jira/browse/MESOS-422 > Project: Mesos > Issue Type: Bug > Components: master > Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 > Reporter: Bill Farner > Assignee: Yan Xu > Priority: Minor > Labels: twitter > Fix For: 0.16.0 > > > When a leading master exits abruptly, it may fatefully restart and think it's > the leader. If particularly unlucky, this could result in a set of masters > that are indefinitely unstable. > Sequence of events: > - Master process becomes leader > - Master process exits, session expiration counter begins > - Master process restarts, reads leader node contents, and decides it's the > leader (based on PID equality) > - Previous master session expires, node is deleted > - Master decides a different master is leader, commits suicide > - Rinse, repeat for newly-created master node > The salient fact here is that leaders should be concerned with "did i create > the leader node" (ignoring node data) while clients want to be apprised of > leader's node data. > Relevant code: > https://github.com/apache/mesos/blob/trunk/src/detector/detector.cpp#L548 > ZK leader election recipe, for reference: > http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection -- This message was sent by Atlassian JIRA (v6.1#6144)