Vinod Kone created MESOS-714:
--------------------------------
Summary: Slave should check if the (re-)registered is from the
expected master
Key: MESOS-714
URL: https://issues.apache.org/jira/browse/MESOS-714
Project: Mesos
Issue Type: Bug
Reporter: Vinod Kone
Assignee: Vinod Kone
Fix For: 0.15.0
The following sequence of events happened in production at Twitter.
--> Slave registered with master A
--> A sent an ACK for registration but died immediately (user restart)
--> Slave detected a new master B and sent a re-register request
--> Slave received the ACK from A now.
--> The bug here is that the slave accepted this ACK even though it was not
from master B.
--> Master B ignored the re-register request because it didn't know it was the
master yet!
--> Slave never re-tried its registration because it thinks its registered with
B.
At this point slave thinks it is registered but the master (B) has no idea of
it!
Fix: Slaves should check that (re-)registered messages are from the expected
master pid.
--
This message was sent by Atlassian JIRA
(v6.1#6144)