Vinod Kone created MESOS-4712:
---------------------------------
Summary: Remove 'force' field from the Subscribe Call in v1
Scheduler API
Key: MESOS-4712
URL: https://issues.apache.org/jira/browse/MESOS-4712
Project: Mesos
Issue Type: Task
Reporter: Vinod Kone
Assignee: Vinod Kone
We/I introduced the `force` field in SUBSCRIBE call to deal with scheduler
partition cases. Having thought a bit more and discussing with few other folks
([~anandmazumdar], [~greggomann]), I think we can get away from not having that
field in the v1 API. The obvious advantage of removing the field is that
framework devs don't have to think about how/when to set the field (the current
semantics are a bit confusing).
The new workflow when a master receives a SUBSCRIBE call is that master always
accepts this call and closes any existing connection (after sending ERROR
event) from the same scheduler (identified by framework id).
The expectation from schedulers is that they must close the old subscribe
connection before resending a new SUBSCRIBE call.
Lets look at some tricky scenarios and see how this works and why it is safe.
1) Connection disconnection @ the scheduler but not @ the master
Scheduler sees the disconnection and sends a new SUBSCRIBE call. Master sends
ERROR on the old connection (won't be received by the scheduler because the
connection is already closed) and closes it.
2) Connection disconnection @ master but not @ scheduler
Scheduler realizes this from lack of HEARTBEAT events. It then closes its
existing connection and sends a new SUBSCRIBE call. Master accepts the new
SUBSCRIBE call. There is no old connection to close on the master as it is
already closed.
3) Scheduler failover but no disconnection @ master
Newly elected scheduler sends a SUBSCRIBE call. Master sends ERROR event and
closes the old connection (won't be received because the old scheduler failed
over).
4) If Scheduler A got partitioned (but is alive and connected with master) and
Scheduler B got elected as new leader.
When Scheduler B sends SUBSCRIBE, master sends ERROR and closes the connection
from Scheduler A. Master accepts Scheduler B's connection. Typically Scheduler
A aborts after receiving ERROR and gets restarted. After restart it won't
become the leader because Scheduler B is already elected.
5) Scheduler sends SUBSCRIBE, times out, closes the SUBSCRIBE connection (A)
and sends a new SUBSCRIBE (B). Master receives SUBSCRIBE (B) and then receives
SUBSCRIBE (A) but doesn't see A's disconnection yet.
Master first accepts SUBSCRIBE (B). After it receives SUBSCRIBE (A), it sends
ERROR to SUBSCRIBE (B) and closes that connection. When it accepts SUBSCRIBE
(A) and tries to send SUBSCRIBED event the connection closure is detected.
Scheduler retries the SUBSCRIBE connection after a backoff. I think this is a
rare enough race for it to happen continuously in a loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)