[
https://issues.apache.org/jira/browse/MESOS-4671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15146809#comment-15146809
]
Avinash Sridharan commented on MESOS-4671:
------------------------------------------
>From the logs attached to MESOS-4661 it does look like the status updates for
>TASK_RUNNING and TASK_FINISHED got inverted. Here are the relevant logs:
I0212 00:23:11.639288 731 slave.cpp:3002] Handling status update TASK_RUNNING
(UUID: b0810058-a1c0-4918-b699-61c16eb0f46a) for task 1 of framework
6508f198-e145-4d76-844f-0460dc5d7d39-0000
I0212 00:23:11.639622 731 slave.cpp:3002] Handling status update
TASK_FINISHED (UUID: 4674114a-c022-4945-ac98-52c0fae325e5) for task 1 of
framework 6508f198-e145-4d76-844f-0460dc5d7d39-0000
I0212 00:23:11.641832 729 slave.cpp:5661] Terminating task 1
I0212 00:23:11.643185 721 status_update_manager.cpp:320] Received status
update TASK_FINISHED (UUID: 4674114a-c022-4945-ac98-52c0fae325e5) for task 1 of
framework 6508f198-e145-4d76-844f-0460dc5d7d39-0000
I0212 00:23:11.643405 7480 executor.cpp:588] Enqueuing event SUBSCRIBED
received from http://172.17.0.2:57200/slave/api/v1/executor
I0212 00:23:11.643427 721 status_update_manager.cpp:497] Creating
StatusUpdate stream for task 1 of framework
6508f198-e145-4d76-844f-0460dc5d7d39-0000
I0212 00:23:11.644057 721 status_update_manager.cpp:824] Checkpointing UPDATE
for status update TASK_FINISHED (UUID: 4674114a-c022-4945-ac98-52c0fae325e5)
for task 1 of framework 6508f198-e145-4d76-844f-0460dc5d7d39-0000
Received a SUBSCRIBED event
Since `dispatch` maintains the causality of events in a `process`, as long as
the messages go through the same set of `processes` we should never see a
re-ordering of events. However, the implementation of the status method in
`MesosContainerizer` uses `await` to collect all the `ContainerStatus`
`Futures` from the isolators, before completing the `Promise` given to the
agent. The `await` method internally launches a `process` to wait for these
futures. Thus, due to the use of `await` multiple `StatusUpdate` messages might
end up being processed by a different set of `libprocess` thread causing a race.
> Status updates from executor can be forwarded out of order by the Agent.
> ------------------------------------------------------------------------
>
> Key: MESOS-4671
> URL: https://issues.apache.org/jira/browse/MESOS-4671
> Project: Mesos
> Issue Type: Bug
> Components: containerization, HTTP API
> Affects Versions: 0.28.0
> Reporter: Anand Mazumdar
> Assignee: Avinash Sridharan
> Labels: mesosphere
>
> Previously, all status update messages from the executor were forwarded by
> the agent to the master in the order that they had been received.
> However, that seems to be no longer valid due to a recently introduced change
> in the agent:
> {code}
> // Before sending update, we need to retrieve the container status.
> containerizer->status(executor->containerId)
> .onAny(defer(self(),
> &Slave::_statusUpdate,
> update,
> pid,
> executor->id,
> lambda::_1));
> {code}
> This can sometimes lead to status updates being sent out of order depending
> on the order the {{Future}} is fulfilled from the call to {{status(...)}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)