[
https://issues.apache.org/jira/browse/MESOS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622099#comment-16622099
]
Qian Zhang commented on MESOS-9231:
-----------------------------------
I added some logs in Mesos's Docker library (`src/docker/docker.cpp`) and
reproduced this issue again, and then I found the incomplete result returned by
`docker inspect` is the below which indeed has no Docker container ID.
{code:java}
[
{
"Driver": "rexray",
"Labels": null,
"Mountpoint": "/",
"Name": "",
"Options": {},
"Scope": "global",
"Status": {
"availabilityZone": "",
"fields": null,
"iops": 0,
"name": "",
"server": "ebs",
"service": "ebs",
"size": 0,
"type": ""
}
}
]
{code}
And I found the Docker version in the agent host is 1.13.1 which is a little
bit old, I suspect the newer version of Docker might not have this issue.
> `docker inspect` may return an incomplete result to Docker executor due to a
> race condition
> -------------------------------------------------------------------------------------------
>
> Key: MESOS-9231
> URL: https://issues.apache.org/jira/browse/MESOS-9231
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.4.2, 1.5.1, 1.6.1
> Reporter: Qian Zhang
> Assignee: Qian Zhang
> Priority: Major
>
> In the Docker container (`src/docker/executor`), we call `docker inspect`
> right after `docker run`
> ([https://github.com/apache/mesos/blob/1.6.0/src/docker/executor.cpp#L230:L242),]
> there is a small chance for `docker inspect` to return an incomplete result
> which does not contain the Docker container ID, so we will see an error like
> below:
> {code:java}
> E0830 00:09:37.303499 2428 executor.cpp:385] Failed to inspect container
> 'mesos-eaa4f455-0a2c-47ff-bf98-8bd0ad243740': Unable to create container:
> Unable to find Id in container
> {code}
> If that happens, Docker executor will not send `TASK_RUNNING` status update,
> so the task will be stuck at `TASK_STARTING`.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)