[ 
https://issues.apache.org/jira/browse/MESOS-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Schlansker updated MESOS-2212:
-------------------------------------
    Description: 
Currently, the Docker containerizer executes a "exit $(docker wait 
$CONTAINER_NAME)".  This misses a couple of edge cases in the 'docker wait' API 
-- notably, if an OOM condition occurs, it will return "-1" (which is not a 
valid exit code for sh, causing an error, see 
https://issues.apache.org/jira/browse/MESOS-2209.

If a Docker container OOMs, the 'docker inspect' output will set 
'State.OOMKilled' to 'true' and 'docker wait' will return -1.  This should be 
handled more gracefully.  In particular, setting the message to indicate that 
the OOM killer intervened would be very useful as then end users can know the 
real reason their task died.

{code}
    "State": {
        "Error": "",
        "ExitCode": -1,
        "FinishedAt": "2015-01-08T18:38:39.834089879Z",
        "OOMKilled": true,
        "Paused": false,
        "Pid": 0,
        "Restarting": false,
        "Running": false,
        "StartedAt": "2015-01-08T18:38:39.309034983Z"
    }
{code}

I've filed a bug on Docker as well: https://github.com/docker/docker/issues/9979

  was:
Currently, the Docker containerizer executes a "exit $(docker wait 
$CONTAINER_NAME)".  This misses a couple of edge cases in the 'docker wait' API 
-- notably, if an OOM condition occurs, it will return "-1" (which is not a 
valid exit code for sh, causing an error, see 
https://issues.apache.org/jira/browse/MESOS-2209.

If a Docker container OOMs, the 'docker inspect' output will set 
'State.OOMKilled' to 'true' and 'docker wait' will return -1.  This should be 
handled more gracefully.

{code}
    "State": {
        "Error": "",
        "ExitCode": -1,
        "FinishedAt": "2015-01-08T18:38:39.834089879Z",
        "OOMKilled": true,
        "Paused": false,
        "Pid": 0,
        "Restarting": false,
        "Running": false,
        "StartedAt": "2015-01-08T18:38:39.309034983Z"
    }
{code}

I've filed a but on Docker as well: https://github.com/docker/docker/issues/9979


> Better handling of errors during `docker wait`
> ----------------------------------------------
>
>                 Key: MESOS-2212
>                 URL: https://issues.apache.org/jira/browse/MESOS-2212
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.21.0
>            Reporter: Steven Schlansker
>
> Currently, the Docker containerizer executes a "exit $(docker wait 
> $CONTAINER_NAME)".  This misses a couple of edge cases in the 'docker wait' 
> API -- notably, if an OOM condition occurs, it will return "-1" (which is not 
> a valid exit code for sh, causing an error, see 
> https://issues.apache.org/jira/browse/MESOS-2209.
> If a Docker container OOMs, the 'docker inspect' output will set 
> 'State.OOMKilled' to 'true' and 'docker wait' will return -1.  This should be 
> handled more gracefully.  In particular, setting the message to indicate that 
> the OOM killer intervened would be very useful as then end users can know the 
> real reason their task died.
> {code}
>     "State": {
>         "Error": "",
>         "ExitCode": -1,
>         "FinishedAt": "2015-01-08T18:38:39.834089879Z",
>         "OOMKilled": true,
>         "Paused": false,
>         "Pid": 0,
>         "Restarting": false,
>         "Running": false,
>         "StartedAt": "2015-01-08T18:38:39.309034983Z"
>     }
> {code}
> I've filed a bug on Docker as well: 
> https://github.com/docker/docker/issues/9979



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to