[ 
https://issues.apache.org/jira/browse/MESOS-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086450#comment-16086450
 ] 

Benjamin Mahler commented on MESOS-7783:
----------------------------------------

Took a quick look at the code, this comment \[1\] in the kill task handling of 
the agent says that we avoid removing the framework so that the TASK_KILLED 
message can be sent later. However, when we later discover the task was killed 
during the launch path, the framework appears to have already been removed and 
we don't generate the update \[2\].

It appears that somehow the framework gets removed in the interim, but it's not 
in the logs. [~bbannier] these agent logs appear to be filtered on the task id, 
do you have the full agent logs? That should help reveal the cause.

\[1\] 
https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp?utf8=%E2%9C%93#L2473-L2477
\[2\] 
https://github.com/apache/mesos/blob/1.2.0/src/slave/slave.cpp?utf8=%E2%9C%93#L1788-L1794

> Framework might not receive status update when a just launched task is killed 
> immediately
> -----------------------------------------------------------------------------------------
>
>                 Key: MESOS-7783
>                 URL: https://issues.apache.org/jira/browse/MESOS-7783
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.2.0
>            Reporter: Benjamin Bannier
>         Attachments: logs
>
>
> Our Marathon team are seeing issues in their integration test suite when 
> Marathon gets stuck in an infinite loop trying to kill a just launched task. 
> In their test a task launched which is immediately followed by killing the 
> task -- the framework does e.g., not wait for any task status update.
> In this case the launch and kill messages arrive at the agent in the correct 
> order, but both the launch and kill paths in the agent do not reach the point 
> where a status update is sent to the framework. Since the framework has seen 
> no status update on the task it re-triggers a kill, causing an infinite loop.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to