[
https://issues.apache.org/jira/browse/MESOS-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979631#comment-14979631
]
Vinod Kone commented on MESOS-2864:
-----------------------------------
Re-opened this because the submitted code has bug which was exposed by the test
in the linked ticket MESOS-3770.
Essentially, the master can receive a status update which contains 2 states
inside it (update state and latest state). The update state reflects the state
the update is sent for, whereas the latest state indicates the latest state of
the task according to the slave. See details here:
https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L81
This is how the bug manifests
--> Master receives an update with TASK_RUNNING update (UUID:R) state (and
TASK_FINISHED (UUID:F) latest state).
--> Master::updateTask() updates task->state to TASK_FINISHED and sets
task->stats_update_uuid to UUID:R
--> Master receives ACK for UUID:R which it forwards to slave
--> Master receives an update with TASK_FINISHED update (UUID:F) state (and
TASK_FINISHED (UUID:F) latest state)
--> Master::updateTask() returns immediately because the task is already
terminated *without* setting task->update_uuid
--> Master receives ACK for UUID:F which it is not waiting for and hence
ignores it and doesn't remove the terminated task from its map!
--> At this point the slave has removed the task from its map but the master
hasn't!
The fix is simple. updateTask() should properly set the uuid of the expected
ACK.
> Master should not change the state of a terminal task if it receives another
> terminal update
> --------------------------------------------------------------------------------------------
>
> Key: MESOS-2864
> URL: https://issues.apache.org/jira/browse/MESOS-2864
> Project: Mesos
> Issue Type: Bug
> Reporter: Vinod Kone
> Assignee: Yong Qiao Wang
> Fix For: 0.26.0
>
>
> Currently, when the master receives a terminal update for an already
> terminated (but unacknowledged) task it changes the state to the latest
> update. This is confusing because the slave doesn't change the state of the
> task in such a case. Master should just forward the update without changing
> the task state.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)