[ 
https://issues.apache.org/jira/browse/MESOS-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979631#comment-14979631
 ] 

Vinod Kone commented on MESOS-2864:
-----------------------------------

Re-opened this because the submitted code has bug which was exposed by the test 
in the linked ticket MESOS-3770.

Essentially, the master can receive a status update which contains 2 states 
inside it (update state and latest state). The update state reflects the state 
the update is sent for, whereas the latest state indicates the latest state of 
the task according to the slave. See details here: 
https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L81

This is how the bug manifests

--> Master receives an update with TASK_RUNNING update (UUID:R) state (and 
TASK_FINISHED (UUID:F) latest state).
--> Master::updateTask() updates task->state to TASK_FINISHED and sets 
task->stats_update_uuid to UUID:R 
--> Master receives ACK for UUID:R which it forwards to slave
--> Master receives an update with TASK_FINISHED update (UUID:F) state (and 
TASK_FINISHED (UUID:F) latest state)
--> Master::updateTask() returns immediately because the task is already 
terminated *without* setting task->update_uuid
--> Master receives ACK for UUID:F which it is not waiting for and hence 
ignores it and doesn't remove the terminated task from its map!
--> At this point the slave has removed the task from its map but the master 
hasn't!

The fix is simple. updateTask() should properly set the uuid of the expected 
ACK.

> Master should not change the state of a terminal task if it receives another 
> terminal update
> --------------------------------------------------------------------------------------------
>
>                 Key: MESOS-2864
>                 URL: https://issues.apache.org/jira/browse/MESOS-2864
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Yong Qiao Wang
>             Fix For: 0.26.0
>
>
> Currently, when the master receives a terminal update for an already 
> terminated (but unacknowledged) task it changes the state to the latest 
> update. This is confusing because the slave doesn't change the state of the 
> task in such a case. Master should just forward the update without changing 
> the task state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to