Vinod Kone created MESOS-4712:
---------------------------------

             Summary: Remove 'force' field from the Subscribe Call in v1 
Scheduler API
                 Key: MESOS-4712
                 URL: https://issues.apache.org/jira/browse/MESOS-4712
             Project: Mesos
          Issue Type: Task
            Reporter: Vinod Kone
            Assignee: Vinod Kone


We/I introduced the `force` field in SUBSCRIBE call to deal with scheduler 
partition cases. Having thought a bit more and discussing with few other folks 
([~anandmazumdar], [~greggomann]), I think we can get away from not having that 
field in the v1 API. The obvious advantage of removing the field is that 
framework devs don't have to think about how/when to set the field (the current 
semantics are a bit confusing).

The new workflow when a master receives a SUBSCRIBE call is that master always 
accepts this call and closes any existing connection (after sending ERROR 
event) from the same scheduler (identified by framework id).  

The expectation from schedulers is that they must close the old subscribe 
connection before resending a new SUBSCRIBE call.

Lets look at some tricky scenarios and see how this works and why it is safe.

1) Connection disconnection @ the scheduler but not @ the master
   
Scheduler sees the disconnection and sends a new SUBSCRIBE call. Master sends 
ERROR on the old connection (won't be received by the scheduler because the 
connection is already closed) and closes it.

2) Connection disconnection @ master but not @ scheduler

Scheduler realizes this from lack of HEARTBEAT events. It then closes its 
existing connection and sends a new SUBSCRIBE call. Master accepts the new 
SUBSCRIBE call. There is no old connection to close on the master as it is 
already closed.

3) Scheduler failover but no disconnection @ master

Newly elected scheduler sends a SUBSCRIBE call. Master sends ERROR event and 
closes the old connection (won't be received because the old scheduler failed 
over).

4) If Scheduler A got partitioned (but is alive and connected with master) and 
Scheduler B got elected as new leader.

When Scheduler B sends SUBSCRIBE, master sends ERROR and closes the connection 
from Scheduler A. Master accepts Scheduler B's connection. Typically Scheduler 
A aborts after receiving ERROR and gets restarted. After restart it won't 
become the leader because Scheduler B is already elected.

5) Scheduler sends SUBSCRIBE, times out, closes the SUBSCRIBE connection (A) 
and sends a new SUBSCRIBE (B). Master receives SUBSCRIBE (B) and then receives 
SUBSCRIBE (A) but doesn't see A's disconnection yet.

Master first accepts SUBSCRIBE (B). After it receives SUBSCRIBE (A), it sends 
ERROR to SUBSCRIBE (B) and closes that connection. When it accepts SUBSCRIBE 
(A) and tries to send SUBSCRIBED event the connection closure is detected. 
Scheduler retries the SUBSCRIBE connection after a backoff. I think this is a 
rare enough race for it to happen continuously in a loop.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to