[ 
https://issues.apache.org/jira/browse/MESOS-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114988#comment-16114988
 ] 

Benjamin Mahler commented on MESOS-7783:
----------------------------------------

The bug occurs as follows:

(1) Two (or more) tasks arrive at the agent, but do not yet reach 
{{Slave::_run}}.
(2) Kill task messages arrive at the agent and are processed.
(3) The first task to reach {{Slave::_run}} will cause the framework to be 
removed, since the pending tasks / executors are now empty (see 
[here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp?utf8=%E2%9C%93#L1841-L1845]).
(4) The remaining tasks to reach {{Slave::_run}} encounter the framework as 
removed and are dropped without a status update (see 
[here|https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp?utf8=%E2%9C%93#L1788-L1794]).

> Framework might not receive status update when a just launched task is killed 
> immediately
> -----------------------------------------------------------------------------------------
>
>                 Key: MESOS-7783
>                 URL: https://issues.apache.org/jira/browse/MESOS-7783
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.2.0
>            Reporter: Benjamin Bannier
>            Priority: Critical
>              Labels: reliability
>         Attachments: GroupDeployIntegrationTest.log.zip, logs
>
>
> Our Marathon team are seeing issues in their integration test suite when 
> Marathon gets stuck in an infinite loop trying to kill a just launched task. 
> In their test a task launched which is immediately followed by killing the 
> task -- the framework does e.g., not wait for any task status update.
> In this case the launch and kill messages arrive at the agent in the correct 
> order, but both the launch and kill paths in the agent do not reach the point 
> where a status update is sent to the framework. Since the framework has seen 
> no status update on the task it re-triggers a kill, causing an infinite loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to