[ 
https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383511#comment-16383511
 ] 

Stefan Eder commented on MESOS-8158:
------------------------------------

We experience the same issue with Mesos 1.4.1 (did not work with previous 
versions e.g. 1.0, 1.3 either).

Mesos agent started like:
{noformat}
ExecStart=/usr/bin/docker run \
      --name=mesos_agent \
      --net=host \
      --pid=host \ # does not work without this as well
      --privileged \
      -v /cgroup:/cgroup \
      -v /sys:/sys \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -v /mnt/data/mesos/slave/work:/mnt/data/mesos/slave/work \
      -v /mnt/data/mesos/slave/logs:/mnt/data/mesos/slave/logs \
    infonova/mesos-agent:1.4.1-docker-17.09.0-ce \
      --ip=10.0.0.100 \
      --logging_level=INFO \
      --advertise_ip=10.0.0.100 \
      --port=5051 \
      --advertise_port=5051 \
      --master=zk://10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181/MesosDevelopment \
      --containerizers=docker \
      --recover=reconnect \
      --resources=cpus:6;mem:45000;ports:[8000-9000] \
      --log_dir=/mnt/data/mesos/slave/logs \
      --work_dir=/mnt/data/mesos/slave/work \
      --docker_remove_delay=10mins \
      --executor_registration_timeout=20mins \
      --executor_shutdown_grace_period=10mins \
      --hostname=mesos-dev-agent-01 \
      
--attributes=host:mesos-dev-agent-01;type:openstacknode;dockerfs:overlayfs \
      --credential=file:///etc/mesos-slave/password \
      --recovery_timeout=24hrs \
      --docker_config=file:///home/jenkins/.docker/config.json \
      --no-systemd_enable_support \
      --docker_mesos_image=infonova/mesos:1.4.1-docker-17.09.0-ce{noformat}
Logs of mesos-agent indicate, that it does not find the actual task container:
{noformat}
I0302 09:33:34.609871     6 docker.cpp:1136] Starting container 
'f00102b3-f1bd-4575-86fd-769f198db674' for task 'jenkins-mesos-agent-01' (and 
executor 'jenkins-mesos-agent-01') of framework 
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:35.321282     7 slave.cpp:3928] Got registration for executor 
'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 
from executor(1)@10.0.0.100:45619
I0302 09:33:35.322042    11 docker.cpp:1616] Ignoring updating container 
f00102b3-f1bd-4575-86fd-769f198db674 because resources passed to update are 
identical to existing resources
I0302 09:33:35.322141    11 slave.cpp:2598] Sending queued task 
'jenkins-mesos-agent-01' to executor 'jenkins-mesos-agent-01' of framework 
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 at executor(1)@10.0.0.100:45619
E0302 09:33:56.000396    10 slave.cpp:5285] Container 
'f00102b3-f1bd-4575-86fd-769f198db674' for executor 'jenkins-mesos-agent-01' of 
framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 failed to start: Failed to 
run 'docker -H unix:///var/run/docker.sock inspect 
mesos-f00102b3-f1bd-4575-86fd-769f198db674': exited with status 1; 
stderr='Error: No such object: mesos-f00102b3-f1bd-4575-86fd-769f198db674'
I0302 09:33:56.000726     7 slave.cpp:5398] Executor 'jenkins-mesos-agent-01' 
of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 has terminated with 
unknown status
I0302 09:33:56.000792     7 slave.cpp:4392] Handling status update TASK_FAILED 
(UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of 
framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 from @0.0.0.0:0
W0302 09:33:56.001066    12 docker.cpp:1603] Ignoring updating unknown 
container f00102b3-f1bd-4575-86fd-769f198db674
I0302 09:33:56.001164    10 status_update_manager.cpp:323] Received status 
update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task 
jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:56.001391    10 status_update_manager.cpp:834] Checkpointing UPDATE 
for status update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for 
task jenkins-mesos-agent-01 of framework 
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:56.001545    10 slave.cpp:4873] Forwarding the update TASK_FAILED 
(UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task jenkins-mesos-agent-01 of 
framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 to master@10.0.0.81:5050
I0302 09:33:56.023033     7 status_update_manager.cpp:395] Received status 
update acknowledgement (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for task 
jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:56.023077     7 status_update_manager.cpp:834] Checkpointing ACK 
for status update TASK_FAILED (UUID: d5c38f0b-80b8-4b4b-8bf0-34d08196f053) for 
task jenkins-mesos-agent-01 of framework 
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:33:56.023187     8 slave.cpp:5509] Cleaning up executor 
'jenkins-mesos-agent-01' of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 
at executor(1)@10.0.0.100:45619{noformat}
Funny thing is though, the tasks (in our case a Jenkins agent) starts, and even 
lets the Jenkins build execute on it and depending on how long the build runs 
(<=timeouts for executor i guess), the build is even successful. However, the 
Mesos agent logs seem to indicate that something might be wrong:
{noformat}
I0302 09:34:11.504099    13 slave.cpp:4392] Handling status update TASK_RUNNING 
(UUID: f4fd5310-2be7-4f0b-8f18-9c09a46e8dc4) for task jenkins-mesos-agent-01 of 
framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 from 
executor(1)@10.0.0.100:45619
W0302 09:34:11.504151    13 slave.cpp:4448] Ignoring status update TASK_RUNNING 
(UUID: f4fd5310-2be7-4f0b-8f18-9c09a46e8dc4) for task jenkins-mesos-agent-01 of 
framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 for unknown framework 
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:35:48.110642     7 slave.cpp:4392] Handling status update 
TASK_FINISHED (UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task 
jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 
from executor(1)@10.0.0.100:45619
W0302 09:35:48.110693     7 slave.cpp:4507] Could not find the executor for 
status update TASK_FINISHED (UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for 
task jenkins-mesos-agent-01 of framework 
806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:35:48.110776     7 status_update_manager.cpp:323] Received status 
update TASK_FINISHED (UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task 
jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
I0302 09:35:48.110934     7 slave.cpp:4873] Forwarding the update TASK_FINISHED 
(UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task jenkins-mesos-agent-01 of 
framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 to master@10.0.0.81:5050
I0302 09:35:48.110986     7 slave.cpp:4783] Sending acknowledgement for status 
update TASK_FINISHED (UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task 
jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012 
to executor(1)@10.0.0.100:45619
I0302 09:35:48.121263    12 status_update_manager.cpp:395] Received status 
update acknowledgement (UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task 
jenkins-mesos-agent-01 of framework 806cbd1f-6dcf-43b8-b8ed-682995a66dbe-0012
E0302 09:35:48.121376    12 slave.cpp:3680] Status update acknowledgement 
(UUID: fb02fba6-97e6-40b8-ad03-a07708512186) for task jenkins-mesos-agent-01 of 
unknown executor
I0302 09:35:49.133253     7 slave.cpp:5005] Got exited event for 
executor(1)@10.0.0.100:45619{noformat}
However even though the agent seems to not know about the executor, according 
to {{docker inspect f00102b3-f1bd-4575-86fd-769f198db674.executor}} it finished 
at "FinishedAt": "2018-03-02T09:35:49.148439002Z", so right when the agent 
states that it got the "exited event" for the executor.{{}}

 

When removing the flag {{--docker_mesos_image}} everything works as expected 
(except for the recovery when stopping the containerized Mesos agent).

> Mesos Agent in docker neglects to retry discovering Task docker containers
> --------------------------------------------------------------------------
>
>                 Key: MESOS-8158
>                 URL: https://issues.apache.org/jira/browse/MESOS-8158
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent, containerization, docker, executor
>    Affects Versions: 1.4.0
>         Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4
>            Reporter: Charles Allen
>            Priority: Major
>
> I have attempted to launch Mesos agents inside of a docker container in such 
> a way where the agent docker can be replaced and recovered. Unfortunately I 
> hit a major snag in the way the mesos docker launching works.
> To test simple functionality a marathon app is setup that simply has the 
> following command: {{date && python -m SimpleHTTPServer $PORT0}} 
> That way the HTTP port can be accessed to assure things are being assigned 
> correctly, and the date is printed out in the log.
> When I attempt to start this marathon app, the mesos agent (inside a docker 
> container) properly launches an executor which properly creates a second task 
> that launches the python code. Here's the output from the executor logs (this 
> looks correct):
> {code}
> I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0
> I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent 
> d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0
> I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on 
> 10.0.75.2
> I1101 20:34:03.428680 68281 executor.cpp:160] Starting task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002
> I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e 
> HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e 
> MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS
> =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e 
> MARATHON_APP_RESOURCE_MEM=128.0 -e 
> MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e 
> MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA
> SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e 
> PORT0=31464 -e PORTS=31464 -e PORT_10000=31464 -e PORT_HTTP=31464 -v 
> /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp
> .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
>  --net host --entrypoint /bin/sh --name 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 
> --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 
> -c date && p
> ython -m SimpleHTTPServer $PORT0
> I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container 
> not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms
> I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> Wed Nov  1 20:34:06 UTC 2017
> {code}
> But, somehow there is a TASK_FAILED message sent to marathon.
> Upon further investigation, the following snippet can be found in the agent 
> logs (running in a docker container)
> {code}
> I1101 20:34:00.949129     9 slave.cpp:1736] Got assigned task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.950150     9 gc.cpp:93] Unscheduling 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
>  from gc
> I1101 20:34:00.950225     9 gc.cpp:93] Unscheduling 
> '/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
>  from gc
> I1101 20:34:00.950472    12 slave.cpp:2003] Authorizing task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.951210    12 slave.cpp:2171] Launching task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.952265    12 paths.cpp:578] Trying to chown 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec7
> 5' to user 'root'
> I1101 20:34:00.952733    12 slave.cpp:7256] Launching executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 with resources 
> [{"allocation_info":{"role":"*"},"name":"cpus","scalar":{"value":0.1},"type":"SCALAR"},{"a
> llocation_info":{"role":"*"},"name":"mem","scalar":{"value":32.0},"type":"SCALAR"}]
>  in work directory 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac1100
> 02/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
> I1101 20:34:00.953045    12 slave.cpp:2858] Launching container 
> 84f9ae30-9d4c-484a-860c-ca7845b7ec75 for executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.955057     6 docker.cpp:1136] Starting container 
> '84f9ae30-9d4c-484a-860c-ca7845b7ec75' for task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' (and executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002') of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45
> -0001
> I1101 20:34:00.955263    12 slave.cpp:2400] Queued task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:00.955965     6 docker.cpp:1531] Running docker -H 
> unix:///var/run/docker.sock inspect python:2
> I1101 20:34:01.037293     8 docker.cpp:454] Docker pull python:2 completed
> I1101 20:34:01.038180     8 docker.cpp:1080] Running docker -H 
> unix:///var/run/docker.sock run --cpu-shares 1126 --memory 167772160 -e 
> LIBPROCESS_IP=10.0.75.2 -e LIBPROCESS_PORT=0 -e 
> MESOS_AGENT_ENDPOINT=10.0.75.2:5051 -e MESOS_CHECKPOINT=1 -e 
> MESOS_CONTAINER_NAME=mesos
> -84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor -e 
> MESOS_DIRECTORY=/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca784
> 5b7ec75 -e MESOS_EXECUTOR_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e 
> MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs -e 
> MESOS_FRAMEWORK_ID=a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 -e 
> MESOS_HTTP_COMMAND_EXECUTOR=0 -e 
> MESOS_NATIVE_JAVA_LIBRARY=/usr/lib/libmesos-1.4.0.
> so -e MESOS_NATIVE_LIBRARY=/usr/lib/libmesos-1.4.0.so -e 
> MESOS_RECOVERY_TIMEOUT=15mins -e MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_SLAVE_ID=d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0 -e 
> MESOS_SLAVE_PID=slave(1)@10.0.75.2:5051 -e 
> MESOS_SUBSCRIPTION_BACKOFF_MAX=2secs -v /va
> r/run/docker.sock:/var/run/docker.sock:ro -v 
> /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/var/run/meso
> s/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:rw
>  -v /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/fra
> meworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox
>  --net host --name mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor 
> --pid=host --cap-add=SYS_ADMIN --c
> ap-add=SYS_PTRACE mesos-docker /usr/libexec/mesos/mesos-docker-executor 
> --cgroups_enable_cfs=false 
> --container=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 --docker=docker 
> --docker_socket=/var/run/docker.sock --help=false 
> --initialize_driver_logging=true --launcher_dir=/u
> sr/libexec/mesos --logbufsecs=0 --logging_level=INFO 
> --mapped_directory=/mnt/mesos/sandbox --quiet=false 
> --sandbox_directory=/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf4
> 3-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75 
> --stop_timeout=0ns
> I1101 20:34:01.040096     8 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
> I1101 20:34:01.138551     5 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor', interval: 1secs
> I1101 20:34:02.138964    13 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
> I1101 20:34:03.423805     5 slave.cpp:3935] Got registration for executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
> I1101 20:34:03.424316     5 docker.cpp:1616] Ignoring updating container 
> 84f9ae30-9d4c-484a-860c-ca7845b7ec75 because resources passed to update are 
> identical to existing resources
> I1101 20:34:03.424396     5 slave.cpp:2605] Sending queued task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' to executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 at executor(1)@10.0.75.2:35675
> I1101 20:34:04.052783    11 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> E1101 20:34:04.156435    12 slave.cpp:5292] Container 
> '84f9ae30-9d4c-484a-860c-ca7845b7ec75' for executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 failed to start: Failed to run 
> 'docker -H unix:///var/run/dock
> er.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75': exited with 
> status 1; stderr='Error: No such object: 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> '
> I1101 20:34:04.156497    12 docker.cpp:2078] Container 
> 84f9ae30-9d4c-484a-860c-ca7845b7ec75 launch failed
> I1101 20:34:04.156622     7 slave.cpp:5405] Executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 has terminated with unknown status
> I1101 20:34:04.156958     7 slave.cpp:4399] Handling status update 
> TASK_FAILED (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from @0.0.0.0:0
> E1101 20:34:04.157133     7 slave.cpp:4721] Failed to update resources for 
> container 84f9ae30-9d4c-484a-860c-ca7845b7ec75 of executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' running task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002 on status update for terminal t
> ask, destroying container: Container not found
> W1101 20:34:04.157173    11 composing.cpp:582] Attempted to destroy unknown 
> container 84f9ae30-9d4c-484a-860c-ca7845b7ec75
> I1101 20:34:04.159068    12 status_update_manager.cpp:323] Received status 
> update TASK_FAILED (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.159276    12 status_update_manager.cpp:834] Checkpointing 
> UPDATE for status update TASK_FAILED (UUID: 
> 32c43a03-cede-49f8-9676-fd9411382c58) for task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.159399    12 slave.cpp:4880] Forwarding the update TASK_FAILED 
> (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 to master@10.0.75.2:5050
> I1101 20:34:04.294680     8 status_update_manager.cpp:395] Received status 
> update acknowledgement (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) for task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.294747     8 status_update_manager.cpp:834] Checkpointing ACK 
> for status update TASK_FAILED (UUID: 32c43a03-cede-49f8-9676-fd9411382c58) 
> for task testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.294945     8 slave.cpp:5516] Cleaning up executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 at executor(1)@10.0.75.2:35675
> I1101 20:34:04.295308     8 slave.cpp:5612] Cleaning up framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:04.295418     8 gc.cpp:59] Scheduling 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
>  for gc
>  6.99999658438519days in the future
> I1101 20:34:04.295459     8 gc.cpp:59] Scheduling 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
>  for gc 6.99999658329778days in the future
> I1101 20:34:04.295481     8 gc.cpp:59] Scheduling 
> '/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75'
>  f
> or gc 6.99999658267259days in the future
> I1101 20:34:04.295516     8 gc.cpp:59] Scheduling 
> '/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
>  for gc 6.99999658223407days in the future
> I1101 20:34:04.295537     8 gc.cpp:59] Scheduling 
> '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
>  for gc 6.99999658143407days in the future
> I1101 20:34:04.295558     8 gc.cpp:59] Scheduling 
> '/var/run/mesos/meta/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001'
>  for gc 6.9999965810637days in the future
> I1101 20:34:04.295581     8 status_update_manager.cpp:285] Closing status 
> update streams for framework a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:06.748510     5 slave.cpp:4399] Handling status update 
> TASK_RUNNING (UUID: 001f20d1-fe18-4cb0-9b39-cb9cd3cb9741) for task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
> W1101 20:34:06.748546     5 slave.cpp:4455] Ignoring status update 
> TASK_RUNNING (UUID: 001f20d1-fe18-4cb0-9b39-cb9cd3cb9741) for task 
> testapp.fe35282f-bf43-11e7-a24b-0242ac110002 of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 for unknown framework a5eb6da1-f8ac-
> 4642-8d66-cdd2e5b14d45-0001
> I1101 20:34:45.960857     9 slave.cpp:5828] Framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 seems to have exited. Ignoring 
> registration timeout for executor 
> 'testapp.f53fc2ce-bf43-11e7-a24b-0242ac110002'
> I1101 20:34:52.039710    12 slave.cpp:5920] Current disk usage 0.50%. Max 
> allowed age: 6.265284309533982days
> I1101 20:35:00.955883    12 slave.cpp:5828] Framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 seems to have exited. Ignoring 
> registration timeout for executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002'
> I1101 20:35:52.040870    11 slave.cpp:5920] Current disk usage 0.50%. Max 
> allowed age: 6.265284309533982days
> I1101 20:36:52.041280     5 slave.cpp:5920] Current disk usage 0.50%. Max 
> allowed age: 6.265284309533982days
> I1101 20:37:52.042034    11 slave.cpp:5920] Current disk usage 0.50%. Max 
> allowed age: 6.265284309533982days
> {code}
> Of particular note is the following line:
> {code}
> I1101 20:34:01.138551     5 docker.cpp:1290] Retrying inspect with non-zero 
> status code. cmd: 'docker -H unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor', interval: 1secs
> I1101 20:34:02.138964    13 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75.executor
> I1101 20:34:03.423805     5 slave.cpp:3935] Got registration for executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 from executor(1)@10.0.75.2:35675
> I1101 20:34:03.424316     5 docker.cpp:1616] Ignoring updating container 
> 84f9ae30-9d4c-484a-860c-ca7845b7ec75 because resources passed to update are 
> identical to existing resources
> I1101 20:34:03.424396     5 slave.cpp:2605] Sending queued task 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' to executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 at executor(1)@10.0.75.2:35675
> I1101 20:34:04.052783    11 docker.cpp:1243] Running docker -H 
> unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> E1101 20:34:04.156435    12 slave.cpp:5292] Container 
> '84f9ae30-9d4c-484a-860c-ca7845b7ec75' for executor 
> 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' of framework 
> a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 failed to start: Failed to run 
> 'docker -H unix:///var/run/dock
> er.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75': exited with 
> status 1; stderr='Error: No such object: 
> mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75
> '
> {code}
> For some reason it seems like the AGENT is trying to connect to the TASK 
> docker container in addition to the executor container, and is failing to 
> find it and instantly dying instead of obeying a retry mechanism. I do not 
> have an environment setup to trace the code path of this error state, but the 
> expected behavior is either that the Agent leaves it to the executor to track 
> the task, or that the agent respects retries. Neither of which seem to be 
> happening here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to