Kevin Klues created MESOS-5799: ---------------------------------- Summary: docker::inspect() may get wrong output when a docker container is not in "running" state Key: MESOS-5799 URL: https://issues.apache.org/jira/browse/MESOS-5799 Project: Mesos Issue Type: Bug Components: containerization, docker Reporter: Kevin Klues Fix For: 1.0.0
I (klueska) am copying the text from an email I got about a bug report from Yubo Li at IBM. docker::inspect() may get wrong output when the docker container is not in "running" state. In this case, the "docker inspect" will failed to parse data, and system can not enter TASK:RUNNING status. I attached related logs in stderr, I printed the docker inspect output. The inspected output shows that the docker is in "created" status, not "running", so that many of inspect fields are invalid. Possible Fix: detect the "State->Running" field, and get success return when "State->Running" is true. {noformat} I0706 09:01:05.342895 2975 docker.cpp:780] Running docker -H unix:///var/run/docker.sock run --cpu-shares 512 --memory 536870912 -e MARATHON_APP_VERSION=2016-07-06T08:15:02.610Z -e HOST=9.186.57.67 -e MARATHON_APP_RESOURCE_CPUS=0.5 -e MARATHON_APP_RESOURCE_GPUS=1 -e MARATHON_APP_DOCKER_IMAGE=cuda_test_v0.1 -e PORT_10000=31435 -e MESOS_TASK_ID=ubuntu-gpu-32520.29f083bf-4358-11e6-b886-2ee1446b5607 -e PORT=31435 -e MARATHON_APP_RESOURCE_MEM=512.0 -e PORTS=31435 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_LABELS= -e MARATHON_APP_ID=/ubuntu-gpu-32520 -e PORT0=31435 -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_CONTAINER_NAME=mesos-1875c0d3-9712-43c3-9d58-572c89fac50b-S1.cfe287a0-8a37-4a0f-8ffb-55eb0e6e4439 -v /var/run/mesos/slaves/1875c0d3-9712-43c3-9d58-572c89fac50b-S1/frameworks/aee07017-f8e6-4ed5-8008-b4ea3a090282-0000/executors/ubuntu-gpu-32520.29f083bf-4358-11e6-b886-2ee1446b5607/runs/cfe287a0-8a37-4a0f-8ffb-55eb0e6e4439:/mnt/mesos/sandbox --net host --device=/dev/nvidiactl:/dev/nvidiactl:rwm --device=/dev/nvidia-uvm:/dev/nvidia-uvm:rwm --device=/dev/nvidia0:/dev/nvidia0:rwm --entrypoint /bin/sh --name mesos-1875c0d3-9712-43c3-9d58-572c89fac50b-S1.cfe287a0-8a37-4a0f-8ffb-55eb0e6e4439 cuda_test_v0.1 -c nvidia-smi && sleep 60s I0706 09:01:05.345935 2975 docker.cpp:943] Running docker -H unix:///var/run/docker.sock inspect mesos-1875c0d3-9712-43c3-9d58-572c89fac50b-S1.cfe287a0-8a37-4a0f-8ffb-55eb0e6e4439 I0706 09:01:05.548992 2976 docker.cpp:249] Docker inspect: [ { "Id": "5a4dc17e739b60593c04abf310f2485dddea832476e83007387b612839933f5a", "Created": "2016-07-06T09:01:05.531216924Z", "Path": "/bin/sh", "Args": [ "-c", "nvidia-smi \u0026\u0026 sleep 60s" ], "State": { "Status": "created", "Running": false, "Paused": false, "Restarting": false, "OOMKilled": false, "Dead": false, "Pid": 0, "ExitCode": 0, "Error": "", "StartedAt": "0001-01-01T00:00:00Z", "FinishedAt": "0001-01-01T00:00:00Z" }, "Image": "8cf6c8da7045ec24b1e561906dfa54ab0276753ec617e139a7b2da3ef72d245e", "ResolvConfPath": "", "HostnamePath": "", "HostsPath": "", "LogPath": "", "Name": "/mesos-1875c0d3-9712-43c3-9d58-572c89fac50b-S1.cfe287a0-8a37-4a0f-8ffb-55eb0e6e4439", "RestartCount": 0, "Driver": "aufs", "ExecDriver": "native-0.2", "MountLabel": "", "ProcessLabel": "", "AppArmorProfile": "", "ExecIDs": null, "HostConfig": { "Binds": null, "ContainerIDFile": "", "LxcConf": null, "Memory": 0, "MemoryReservation": 0, "MemorySwap": 0, "KernelMemory": 0, "CpuShares": 0, "CpuPeriod": 0, "CpusetCpus": "", "CpusetMems": "", "CpuQuota": 0, "BlkioWeight": 0, "OomKillDisable": false, "MemorySwappiness": null, "Privileged": false, "PortBindings": null, "Links": null, "PublishAllPorts": false, "Dns": null, "DnsOptions": null, "DnsSearch": null, "ExtraHosts": null, "VolumesFrom": null, "Devices": null, "NetworkMode": "", "IpcMode": "", "PidMode": "", "UTSMode": "", "CapAdd": null, "CapDrop": null, "GroupAdd": null, "RestartPolicy": { "Name": "", "MaximumRetryCount": 0 }, "SecurityOpt": null, "ReadonlyRootfs": false, "Ulimits": null, "LogConfig": { "Type": "json-file", "Config": {} }, "CgroupParent": "", "ConsoleSize": [ 0, 0 ], "VolumeDriver": "" }, "GraphDriver": { "Name": "aufs", "Data": null }, "Mounts": [], "Config": { "Hostname": "5a4dc17e739b", "Domainname": "", "User": "", "AttachStdin": false, "AttachStdout": true, "AttachStderr": true, "Tty": false, "OpenStdin": false, "StdinOnce": false, "Env": [ "MARATHON_APP_VERSION=2016-07-06T08:15:02.610Z", "HOST=9.186.57.67", "MARATHON_APP_RESOURCE_CPUS=0.5", "MARATHON_APP_RESOURCE_GPUS=1", "MARATHON_APP_DOCKER_IMAGE=cuda_test_v0.1", "PORT_10000=31435", "MESOS_TASK_ID=ubuntu-gpu-32520.29f083bf-4358-11e6-b886-2ee1446b5607", "PORT=31435", "MARATHON_APP_RESOURCE_MEM=512.0", "PORTS=31435", "MARATHON_APP_RESOURCE_DISK=0.0", "MARATHON_APP_LABELS=", "MARATHON_APP_ID=/ubuntu-gpu-32520", "PORT0=31435", "MESOS_SANDBOX=/mnt/mesos/sandbox", "MESOS_CONTAINER_NAME=mesos-1875c0d3-9712-43c3-9d58-572c89fac50b-S1.cfe287a0-8a37-4a0f-8ffb-55eb0e6e4439", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" ], "Cmd": [ "-c", "nvidia-smi \u0026\u0026 sleep 60s" ], "Image": "cuda_test_v0.1", "Volumes": null, "WorkingDir": "", "Entrypoint": [ "/bin/sh" ], "OnBuild": null, "Labels": {}, "StopSignal": "SIGTERM" }, "NetworkSettings": { "Bridge": "", "SandboxID": "", "HairpinMode": false, "LinkLocalIPv6Address": "", "LinkLocalIPv6PrefixLen": 0, "Ports": null, "SandboxKey": "", "SecondaryIPAddresses": null, "SecondaryIPv6Addresses": null, "EndpointID": "", "Gateway": "", "GlobalIPv6Address": "", "GlobalIPv6PrefixLen": 0, "IPAddress": "", "IPPrefixLen": 0, "IPv6Gateway": "", "MacAddress": "", "Networks": null } } ] I0706 09:01:05.549659 2976 docker.cpp:335] Unable to detect IP Address at 'NetworkSettings.Networks..IPAddress', attempting deprecated field WARNING: Your kernel does not support swap limit capabilities, memory limited without swap. I0706 09:01:52.983609 2973 exec.cpp:486] Agent exited, but framework has checkpointing enabled. Waiting 15mins to reconnect with agent 1875c0d3-9712-43c3-9d58-572c89fac50b-S1 I0706 09:02:06.057607 2978 exec.cpp:549] Executor sending status update TASK_FINISHED (UUID: 2cff35f2-9512-4120-b912-74a82c197696) for task ubuntu-gpu-32520.29f083bf-4358-11e6-b886-2ee1446b5607 of framework aee07017-f8e6-4ed5-8008-b4ea3a090282-0000 I0706 09:02:06.058717 2980 poll_socket.cpp:131] Socket error while connecting I0706 09:02:06.058815 2980 process.cpp:1799] Failed to send 'mesos.internal.StatusUpdateMessage' to '127.0.1.1:5051', connect: Socket error while connecting E0706 09:02:06.058931 2980 process.cpp:2104] Failed to shutdown socket with fd 6: Transport endpoint is not connected {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)