maybob created MESOS-8105:
-----------------------------

             Summary: Docker containerizer fails with "Unable to get executor 
pid after launch"
                 Key: MESOS-8105
                 URL: https://issues.apache.org/jira/browse/MESOS-8105
             Project: Mesos
          Issue Type: Bug
          Components: containerization
            Reporter: maybob


When running lots of command at the same time by each command using same 
executor with different executorId by docker,same executor occur error "Unable 
to get executor pid after launch". 
Reason of this error may be "docker inspect" hangs or not return.

{color:red}Log:{color}

{code:java}
I1012 16:15:01.003931 124081 slave.cpp:1619] Got assigned task '920860' for 
framework framework-id-daily
I1012 16:15:01.006091 124081 slave.cpp:1900] Authorizing task '920860' for 
framework framework-id-daily
I1012 16:15:01.008281 124081 slave.cpp:2087] Launching task '920860' for 
framework framework-id-daily
I1012 16:15:01.008779 124081 paths.cpp:573] Trying to chown 
'/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3'
 to user 'maybob'
I1012 16:15:01.009027 124081 slave.cpp:7401] Checkpointing ExecutorInfo to 
'/volumes/sdb1/mesos/meta/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/executor.info'
I1012 16:15:01.009546 124081 slave.cpp:7038] Launching executor 
'Executor_920860' of framework framework-id-daily with resources {} in work 
directory 
'/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3'
I1012 16:15:01.010339 124081 slave.cpp:7429] Checkpointing TaskInfo to 
'/volumes/sdb1/mesos/meta/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3/tasks/920860/task.info'
I1012 16:15:01.010726 124081 slave.cpp:2316] Queued task '920860' for executor 
'Executor_920860' of framework framework-id-daily
I1012 16:15:01.011740 124088 docker.cpp:1175] Starting container 
'29c82b61-1242-4de9-80cf-16f46c30e7e3' for executor 'Executor_920860' and 
framework framework-id-daily
I1012 16:15:01.013123 124081 slave.cpp:877] Successfully attached file 
'/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3'
I1012 16:15:01.013290 124080 fetcher.cpp:353] Starting to fetch URIs for 
container: 29c82b61-1242-4de9-80cf-16f46c30e7e3, directory: 
/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3

I1012 16:15:01.706429 124071 docker.cpp:909] Running docker -H 
unix:///var/run/docker.sock run --cpu-shares 378 --memory 427819008 -e 
LIBPROCESS_PORT=0 -e MESOS_AGENT_ENDPOINT=xxx.xxx.xxx.xxx:5051 -e 
MESOS_CHECKPOINT=1 -e 
MESOS_CONTAINER_NAME=mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3
 -e 
MESOS_DIRECTORY=/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3
 -e MESOS_EXECUTOR_ID=Executor_920860 -e 
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs -e 
MESOS_FRAMEWORK_ID=framework-id-daily -e MESOS_HTTP_COMMAND_EXECUTOR=0 -e 
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos-1.3.1.so -e 
MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-1.3.1.so -e 
MESOS_RECOVERY_TIMEOUT=15mins -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
MESOS_SLAVE_ID=89192f68-d28f-498c-808f-442a1ef576b3-S2 -e 
MESOS_SLAVE_PID=slave(1)@xxx.xxx.xxx.xxx:5051 -e 
MESOS_SUBSCRIPTION_BACKOFF_MAX=2secs -v 
/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3:/mnt/mesos/sandbox
 --net host --entrypoint /bin/sh --name 
mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3
 reg.docker.xxx/xxxxxx/executor:v25 -c env && cd $MESOS_SANDBOX && ./executor.sh
I1012 16:15:01.717859 124071 docker.cpp:1071] Running docker -H 
unix:///var/run/docker.sock inspect 
mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3

I1012 16:15:02.033951 124085 docker.cpp:1118] Retrying inspect with non-zero 
status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3',
 interval: 1secs

I1012 16:15:03.034230 124090 docker.cpp:1071] Running docker -H 
unix:///var/run/docker.sock inspect 
mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3

I1012 16:15:03.518020 124078 docker.cpp:1071] Running docker -H 
unix:///var/run/docker.sock inspect 
mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3

I1012 16:15:29.554232 124076 docker.cpp:1753] Updated 'cpu.shares' to 378 at 
/sys/fs/cgroup/cpuset,cpu,cpuacct/docker/506757580a6fe6529e58560a60db7c7311f9411185211b14e64586b12a7a8427
 for container 29c82b61-1242-4de9-80cf-16f46c30e7e3
I1012 16:15:29.556495 124076 docker.cpp:1817] Updated 
'memory.soft_limit_in_bytes' to 408MB for container 
29c82b61-1242-4de9-80cf-16f46c30e7e3
E1012 16:15:29.559406 124082 slave.cpp:5097] Container 
'29c82b61-1242-4de9-80cf-16f46c30e7e3' for executor 'Executor_920860' of 
framework framework-id-daily failed to start: Unable to get executor pid after 
launch
I1012 16:15:29.559644 124068 docker.cpp:2102] Container 
29c82b61-1242-4de9-80cf-16f46c30e7e3 launch failed
I1012 16:15:29.559890 124077 slave.cpp:5210] Executor 'Executor_920860' of 
framework framework-id-daily has terminated with unknown status
E1012 16:15:29.561193 124077 slave.cpp:4545] Failed to update resources for 
container 29c82b61-1242-4de9-80cf-16f46c30e7e3 of executor 'Executor_920860' 
running task 920860 on status update for terminal task, destroying container: 
Container not found
W1012 16:15:29.561326 124074 composing.cpp:646] Attempted to destroy unknown 
container 29c82b61-1242-4de9-80cf-16f46c30e7e3
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to