[ 
https://issues.apache.org/jira/browse/MESOS-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735569#comment-14735569
 ] 

Vaibhav Khanduja edited comment on MESOS-2863 at 9/8/15 10:46 PM:
------------------------------------------------------------------

@Tim,

Thanks for the feedback.

Tim and Vinod,

I would appreciate if you could help me clear out few details around 
libprocess, as it will help me put out the right fix. I understand, apart from 
other great features,  libprocess provides interface for spawning  child 
process and a mechanism wait on the status. As in executor.cpp, method 
launchTask 


    // Monitor this process.
    process::reap(pid)
      .onAny(defer(self(),
                   &Self::reaped,
                   driver,
                   task.task_id(),
                   pid,
                   lambda::_1));

The reap method is followed by status sending method -> which would TASK_RUNNING

I read comments in reap.hpp, which goes like this:
.................................
// Returns the exit status of the specified process if and only if
// the process is a direct child and it has not already been reaped.
// Otherwise, returns None once the process has been reaped elsewhere
// (or does not exist, which is indistinguishable from being reaped
// elsewhere). This will never discard the returned future.
.............................

which means, for any change in status of child process the parent would get a 
signal (which is comes from unix) the libprocess helps in putting signal 
handler for each of these signal. 

For the bug and are we talking about the "sleep" in the second last line of the 
"reaped" method ..?

........................
 // A hack for now ... but we need to wait until the status update
    // is sent to the slave before we shut ourselves down.
    os::sleep(Seconds(1));
......................

Does it mean that reap would be called when kill is called from shutdown?  I 
guess it won't or maybe? As per comments in reap.hpp, if the child process is 
reaped (SIGCHILD) received .. reap won't be called?

The docker/executor.cpp has similar code, except it calls docker stop for 
killing child processes.  

Kindly share your thoughts on the exact condition when and how would 
TASK_KILLED will be sent after TASK_FINISHED?





> Command executor can send TASK_KILLED after TASK_FINISHED
> ---------------------------------------------------------
>
>                 Key: MESOS-2863
>                 URL: https://issues.apache.org/jira/browse/MESOS-2863
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Vaibhav Khanduja
>              Labels: newbie++
>
> Observed this while doing some tests in our test cluster.
> If the command executor gets a shutdown() (e.g., framework unregistered) 
> after sending TASK_FINISHED but before exiting (there is a forced sleep), it 
> could send a TASK_KILLED update to the slave.
> Ideally the command executor should not send multiple terminal updates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to