[ 
https://issues.apache.org/jira/browse/MESOS-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731460#comment-14731460
 ] 

Vaibhav Khanduja commented on MESOS-2863:
-----------------------------------------

Looking at code docker/executor.cpp method "reaped" 

The method would be called as soon as "any" status is received from "run" i.e. 
running Docker container. The code for "stop.onAny" shall check for inspect and 
if inspect has returned status of container process. Let us assume it was a 
batch process, the docker container started and finished immediately,  the 
status returned was TASK_FINISHED. It then sends out the message and then wait 
in "sleep" , assuming the time is enough for message to be received at slave 
end (BTW, this problem "2" as mentioned in comment). During this time shutdown 
is received, checking for the value "run" (as null) and "killed".

The bug is in the if condition, in checking the value of killed. The process 
was never killed,  so killed is false (default value), run was never "null" so 
the code goes in and calls "stop", then reaped but now with killed = true. The 
reaped method checks for killed value, sending the status as "TASK_KILLED".

If the above analysis is right, one solution could be to change the "if" 
condition in "shutdown" and not check for "killed" but actually just check for 
task status. The task status is not class variable, which can me be made 
instead of "killed". 

if (run.isSome() &&  state == TASK_RUNNING) {

}

The state of process could be FAILED, FINISHED, both of these represent as ! 
killed so is the bug. 

Let me know if this analysis is correct? I will try to reproduce the issue as I 
mention above, though not be sure as this is a race condition. I can then 
further make code changes to put code for review.
 

> Command executor can send TASK_KILLED after TASK_FINISHED
> ---------------------------------------------------------
>
>                 Key: MESOS-2863
>                 URL: https://issues.apache.org/jira/browse/MESOS-2863
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>              Labels: newbie++
>
> Observed this while doing some tests in our test cluster.
> If the command executor gets a shutdown() (e.g., framework unregistered) 
> after sending TASK_FINISHED but before exiting (there is a forced sleep), it 
> could send a TASK_KILLED update to the slave.
> Ideally the command executor should not send multiple terminal updates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to