[ 
https://issues.apache.org/jira/browse/MESOS-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1388:
-----------------------------------

    Assignee: Benjamin Mahler

Synced with [~vinodkone] on this. When the executor is terminated, we'll need 
to send those tasks that are terminal and unacknowledged. It appears that the 
master correctly handles this, but will need to make sure with a more thorough 
examination of the code and some integration tests.

> Inconsistent terminal task state between master and re-registering slave
> ------------------------------------------------------------------------
>
>                 Key: MESOS-1388
>                 URL: https://issues.apache.org/jira/browse/MESOS-1388
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.19.0
>            Reporter: Vinod Kone
>            Assignee: Benjamin Mahler
>
> The following is a sequence of events that could result in master sending 
> TASK_LOST and then TASK_FINISHED for a task to a framework.
> --> Master failed over
> --> Slaves tries to re-register with Master w/ a running task (T)
> --> Master starts re-admission into the registry
> --> Task finishes and slave removes it from its map
> --> The TASK_FINISHED status update is dropped by master as re-admission is 
> in progress
> --> The executor terminates on the slave.
> --> Slave retries re-registration (w/o task T) as master is still busy 
> re-admitting it and hasn't ACKed the re-registration yet
> --> Master finally finishes re-admission and re-adds slave with task T
> --> Master gets a duplicate/enqueued re-registration request (w/o task T) 
> that results in the master sending TASK_LOST during reconciliation.
> --> Master now gets retried TASK_FINISHED update from the slave which it 
> forwards to the scheduler.
> Normally, the slave re-registers and includes terminal unacknowledged tasks 
> in the message to the master. However, when the executor is terminated, the 
> slave does not send any of its tasks. This is problematic when there are 
> unacknowledged updates for tasks ran by the executor.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to