[jira] [Assigned] (MESOS-5195) Docker executor: task logs lost on shutdown

Benjamin Mahler (JIRA) Mon, 06 Jun 2016 16:59:21 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benjamin Mahler reassigned MESOS-5195:
--------------------------------------

    Assignee: Benjamin Mahler

This looks to be a duplicate of MESOS-4279, I'll take this on since we don't 
currently have a responsive maintainer for the docker support.

> Docker executor: task logs lost on shutdown
> -------------------------------------------
>
>                 Key: MESOS-5195
>                 URL: https://issues.apache.org/jira/browse/MESOS-5195
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 0.27.2
>         Environment: Linux 4.4.2 "Ubuntu 14.04.2 LTS"
>            Reporter: Steven Schlansker
>            Assignee: Benjamin Mahler
>             Fix For: 1.0.0
>
>
> When you try to kill a task running in the Docker executor (in our case via 
> Singularity), the task shuts down cleanly but the last logs to standard out / 
> standard error are lost in teardown.
> For example, we run dumb-init.  With debugging on, you can see it should 
> write:
> {noformat}
> DEBUG("Forwarded signal %d to children.\n", signum);
> {noformat}
> If you attach strace to the process, you can see it clearly writes the text 
> to stderr.  But that message is lost and never is written to the sandbox 
> 'stderr' file.
> We believe the issue starts here, in Docker executor.cpp:
> {code}
>   void shutdown(ExecutorDriver* driver)
>   {
>     cout << "Shutting down" << endl;
>     if (run.isSome() && !killed) {
>       // The docker daemon might still be in progress starting the
>       // container, therefore we kill both the docker run process
>       // and also ask the daemon to stop the container.
>       // Making a mutable copy of the future so we can call discard.
>       Future<Nothing>(run.get()).discard();
>       stop = docker->stop(containerName, stopTimeout);
>       killed = true;
>     }
>   }
> {code}
> Notice how the "run" future is discarded *before* the Docker daemon is told 
> to stop -- now what will discarding it do?
> {code}
> void commandDiscarded(const Subprocess& s, const string& cmd)
> {
>   VLOG(1) << "'" << cmd << "' is being discarded";
>   os::killtree(s.pid(), SIGKILL);
> }
> {code}
> Oops, just sent SIGKILL to the entire process tree...
> You can see another (harmless?) side effect in the Docker daemon logs, it 
> never gets a chance to kill the task:
> {noformat}
> ERROR Handler for DELETE 
> /v1.22/containers/mesos-f3bb39fe-8fd9-43d2-80a6-93df6a76807e-S2.0c509380-c326-4ff7-bb68-86a37b54f233
>  returned error: No such container: 
> mesos-f3bb39fe-8fd9-43d2-80a6-93df6a76807e-S2.0c509380-c326-4ff7-bb68-86a37b54f233
> {noformat}
> I suspect that the fix is wait for 'docker->stop()' to complete before 
> discarding the 'run' future.
> Happy to provide more information if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-5195) Docker executor: task logs lost on shutdown

Reply via email to