[
https://issues.apache.org/jira/browse/MESOS-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293175#comment-14293175
]
Ian Babrou commented on MESOS-2252:
-----------------------------------
I added GLOG_v=1 and MESOS_VERBOSE=1 just in case. I've got "future discarded"
error during rolling cluster restart. Logs are here:
https://gist.github.com/bobrik/e9032a01b3d0506df93c
Now it seems that image was not pulled in 3mins (my
executor_registration_timeout) and this is the root cause. I think this should
be included in logs even without GLOG_v=1. It'd great to see it in sandbox too:
MESOS-2020.
One interesting thing that does not belong to this issue:
I0127 07:59:53.326792 7 monitor.cpp:140] Failed to collect resource usage
for container '6b6339de-1737-473c-9d97-eaab1e4d9f6d' for executor
'topface_scruffy_proxy.5655adb2-a5fa-11e4-9eee-56847afe9799' of framework
'20141003-172543-3892422848-5050-1-0000': Failed to get usage: No process found
at 13748
PID is obtained from "docker inspect" and it is PID in root pidns. Mesos slave
runs in own pidns and cannot see that PID. Is there an issue for this too?
> Docker containers fail to start with "future discarded" error
> -------------------------------------------------------------
>
> Key: MESOS-2252
> URL: https://issues.apache.org/jira/browse/MESOS-2252
> Project: Mesos
> Issue Type: Bug
> Components: docker, slave
> Affects Versions: 0.21.0
> Environment: Mesos slaves in containers, image
> mesosphere/mesos-slave:0.21.0-1.0.ubuntu1404 on docker hub. Docker 1.4.1,
> marathon 0.8.0-SNAPSHOT
> Reporter: Ian Babrou
> Labels: docker, executors, slave
>
> I tried to launch my dockerized app with 50 tasks on marathon and all tasks
> failed to run. Usually app works just fine.
> Backstory:
> https://github.com/mesosphere/marathon/issues/1083#issuecomment-71196704
> Marathon logs:
> [2015-01-23 13:22:30,163] INFO Starting app /topface/prod-test/app
> (mesosphere.marathon.SchedulerActions:363)
> [2015-01-23 13:22:30,165] INFO Already running 0 instances of
> /topface/prod-test/app. Not scaling.
> (mesosphere.marathon.SchedulerActions:512)
> [2015-01-23 13:22:35,339] INFO Received status update for task
> topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:22:35,367] INFO Task
> topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:22:35,368] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:22:35,369] INFO Task launch delay for [/topface/prod-test/app]
> is now [999483319 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:22:45,345] INFO Received status update for task
> topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:22:45,359] INFO Task
> topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:22:45,360] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:22:45,360] INFO Task launch delay for [/topface/prod-test/app]
> is now [999838313 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,942] INFO Received status update for task
> topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,946] INFO Task launch delay for [/topface/prod-test/app]
> is now [1149948119 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,946] INFO Task
> topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,946] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,955] INFO Received status update for task
> topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,957] INFO Task launch delay for [/topface/prod-test/app]
> is now [1321950877 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,958] INFO Task
> topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,958] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,958] INFO Received status update for task
> topface_prod-test_app.e2bb3906-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,960] INFO Task launch delay for [/topface/prod-test/app]
> is now [1519954162 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,960] INFO Task
> topface_prod-test_app.e2bb3906-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,961] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,961] INFO Received status update for task
> topface_prod-test_app.e2c30146-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,963] INFO Task launch delay for [/topface/prod-test/app]
> is now [1746973326 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,970] INFO Task
> topface_prod-test_app.e2c30146-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,970] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,970] INFO Received status update for task
> topface_prod-test_app.e2ba9cc2-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,973] INFO Task
> topface_prod-test_app.e2ba9cc2-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,973] INFO Task launch delay for [/topface/prod-test/app]
> is now [2008991202 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,973] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,973] INFO Received status update for task
> topface_prod-test_app.e2bc4a7c-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,975] INFO Task launch delay for [/topface/prod-test/app]
> is now [2309993195 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,976] INFO Task
> topface_prod-test_app.e2bc4a7c-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,976] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,976] INFO Received status update for task
> topface_prod-test_app.e2bb11f5-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> First task failed to start because of network setup (docker registry was
> unavailable). Second task ended up on the same host and failed as well:
> E0123 13:22:35.287389 13 slave.cpp:2787] Container
> '0a1225ce-98bd-4f83-a417-b7cf72bb90e8' for executor
> 'topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed to start: Failed to 'docker
> pull docker.core.tf/topface-prod-app:20150123019': exit status = exited with
> status 1 stderr = time="2015-01-23T13:22:35Z" level="fatal" msg="Error:
> Invalid registry endpoint https://docker.core.tf/v1/: Get
> https://docker.core.tf/v1/_ping: dial tcp 10.5.1.194:443: connection timed
> out. If this private registry supports only HTTP or HTTPS with an unknown CA
> certificate, please add `--insecure-registry docker.core.tf` to the daemon's
> arguments. In the case of HTTPS, if you have access to the registry's CA
> certificate, no need for the flag; simply place the CA certificate at
> /etc/docker/certs.d/docker.core.tf/ca.crt"
> E0123 13:22:35.303208 13 slave.cpp:2882] Termination of executor
> 'topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
> 0a1225ce-98bd-4f83-a417-b7cf72bb90e8
> E0123 13:22:35.303503 6 slave.cpp:3134] Failed to unmonitor container for
> executor topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799 of
> framework 20141003-172543-3892422848-5050-1-0000: Not monitored
> W0123 13:22:35.304908 11 docker.cpp:1184] Ignoring updating unknown
> container: 0a1225ce-98bd-4f83-a417-b7cf72bb90e8
> E0123 13:22:45.330379 12 slave.cpp:2787] Container
> '60a2fe62-4d64-4594-b1be-7e5795d6323c' for executor
> 'topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed to start: Failed to 'docker
> pull docker.core.tf/topface-prod-app:20150123019': exit status = exited with
> status 1 stderr = time="2015-01-23T13:22:45Z" level="fatal" msg="Error:
> Invalid registry endpoint https://docker.core.tf/v1/: Get
> https://docker.core.tf/v1/_ping: dial tcp 10.5.1.194:443: connection timed
> out. If this private registry supports only HTTP or HTTPS with an unknown CA
> certificate, please add `--insecure-registry docker.core.tf` to the daemon's
> arguments. In the case of HTTPS, if you have access to the registry's CA
> certificate, no need for the flag; simply place the CA certificate at
> /etc/docker/certs.d/docker.core.tf/ca.crt"
> E0123 13:22:45.330746 12 slave.cpp:2882] Termination of executor
> 'topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
> 60a2fe62-4d64-4594-b1be-7e5795d6323c
> E0123 13:22:45.340802 9 slave.cpp:3134] Failed to unmonitor container for
> executor topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799 of
> framework 20141003-172543-3892422848-5050-1-0000: Not monitored
> W0123 13:22:45.342725 11 docker.cpp:1184] Ignoring updating unknown
> container: 60a2fe62-4d64-4594-b1be-7e5795d6323c
> Third task failed because of future discarded error:
> E0123 13:23:31.906733 12 slave.cpp:2787] Container
> 'bd0337a2-41f4-4308-85a9-68a3ff0475e6' for executor
> 'topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed to start: future discarded
> E0123 13:23:31.907039 12 slave.cpp:2882] Termination of executor
> 'topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
> bd0337a2-41f4-4308-85a9-68a3ff0475e6
> E0123 13:23:31.907260 7 slave.cpp:3134] Failed to unmonitor container for
> executor topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799 of
> framework 20141003-172543-3892422848-5050-1-0000: Not monitored
> Fourth task failed because of future discarded error too:
> E0123 13:23:31.932677 8 slave.cpp:2787] Container
> '782c163a-9238-4f3b-b9fd-dcc50579322a' for executor
> 'topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed to start: future discarded
> E0123 13:23:31.933078 8 slave.cpp:2882] Termination of executor
> 'topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
> 782c163a-9238-4f3b-b9fd-dcc50579322a
> E0123 13:23:31.967974 6 slave.cpp:3134] Failed to unmonitor container for
> executor topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799 of
> framework 20141003-172543-3892422848-5050-1-0000: Not monitored
> I think this "future discarded" thing should be fixed. Ideally more
> understandable error message should be introduced.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)