[
https://issues.apache.org/jira/browse/MESOS-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293113#comment-14293113
]
Ian Babrou commented on MESOS-2252:
-----------------------------------
Here's full log:
# cat /var/log/mesos/slave/web22/mesos-slave.INFO
Log file created at: 2015/01/26 11:38:55
Running on machine: web22
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
W0126 11:38:55.425200 1 logging.cpp:172] WARNING level logging started!
W0126 11:38:55.777034 6 state.cpp:476] Failed to find executor libprocess
pid file
'/var/lib/mesos/slave/meta/slaves/20141221-104540-3892422848-5050-1-S58/frameworks/20141003-172543-3892422848-5050-1-0000/executors/topface_prod-test_app.7e3eff14-a54d-11e4-b7fb-56847afe9799/runs/f913c455-c405-4eee-bc82-a789f1b77528/pids/libprocess.pid'
W0126 11:38:55.940167 6 state.cpp:435] Failed to find executor forked pid
file
'/var/lib/mesos/slave/meta/slaves/20141221-104540-3892422848-5050-1-S58/frameworks/20141003-172543-3892422848-5050-1-0000/executors/topface_prod-test_app.e2ba9cc2-a302-11e4-bea0-56847afe9799/runs/e2ba5117-aae5-49f0-aa59-2029e5af3930/pids/forked.pid'
W0126 11:38:56.075984 6 state.cpp:435] Failed to find executor forked pid
file
'/var/lib/mesos/slave/meta/slaves/20141221-104540-3892422848-5050-1-S58/frameworks/20141003-172543-3892422848-5050-1-0000/executors/topface_prod-test_app.fbd80c15-a318-11e4-acff-56847afe9799/runs/6813e874-a6c1-4f0c-a173-79cfe7a882e2/pids/forked.pid'
W0126 12:00:10.457042 11 slave.cpp:1699] Ignoring updating pid for framework
20141028-073834-3925977280-5050-1-0007 because it does not exist
W0126 16:37:39.376584 7 slave.cpp:1699] Ignoring updating pid for framework
20141003-172543-3892422848-5050-1-0000 because it does not exist
E0126 17:50:35.759987 13 slave.cpp:2344] Failed to update resources for
container dde7a288-10f0-482b-a741-3be538004117 of executor
topface_prod-test_app.73495fc3-a581-11e4-bd65-56847afe9799 running task
topface_prod-test_app.73495fc3-a581-11e4-bd65-56847afe9799 on status update for
terminal task, destroying container: Failed to determine cgroup for the 'cpu'
subsystem: Failed to read /proc/36936/cgroup: Failed to open file
'/proc/36936/cgroup': No such file or directory
E0126 19:08:20.639984 6 slave.cpp:2344] Failed to update resources for
container e46498be-8a11-4303-b260-bffc6dace289 of executor
topface_prod-test_app.b4abd48c-a589-11e4-bd65-56847afe9799 running task
topface_prod-test_app.b4abd48c-a589-11e4-bd65-56847afe9799 on status update for
terminal task, destroying container: Failed to determine cgroup for the 'cpu'
subsystem: Failed to read /proc/39135/cgroup: Failed to open file
'/proc/39135/cgroup': No such file or directory
W0126 19:08:52.319033 13 slave.cpp:1699] Ignoring updating pid for framework
20141028-073834-3925977280-5050-1-0007 because it does not exist
E0126 19:11:22.086526 7 slave.cpp:2787] Container
'9d510692-7a8a-4dcb-b53d-59c89aa1ed6f' for executor
'topface_collectd_es_search_web521.b20999a9-a58e-11e4-9fb3-56847afe9799' of
framework '20141003-172543-3892422848-5050-1-0000' failed to start: future
discarded
E0126 19:11:22.086645 7 slave.cpp:2882] Termination of executor
'topface_collectd_es_search_web521.b20999a9-a58e-11e4-9fb3-56847afe9799' of
framework '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
9d510692-7a8a-4dcb-b53d-59c89aa1ed6f
E0126 19:11:22.086786 11 slave.cpp:3134] Failed to unmonitor container for
executor topface_collectd_es_search_web521.b20999a9-a58e-11e4-9fb3-56847afe9799
of framework 20141003-172543-3892422848-5050-1-0000: Not monitored
W0126 19:11:22.095234 11 docker.cpp:1183] Ignoring updating unknown
container: 9d510692-7a8a-4dcb-b53d-59c89aa1ed6f
W0127 01:19:47.922758 9 slave.cpp:1548] Cannot shut down unknown framework
20150126-100626-3892422848-5050-1-0000
I'm running slaves with "--logging_level=WARNING --quiet", can it be the issue?
> Docker containers fail to start with "future discarded" error
> -------------------------------------------------------------
>
> Key: MESOS-2252
> URL: https://issues.apache.org/jira/browse/MESOS-2252
> Project: Mesos
> Issue Type: Bug
> Components: docker, slave
> Affects Versions: 0.21.0
> Environment: Mesos slaves in containers, image
> mesosphere/mesos-slave:0.21.0-1.0.ubuntu1404 on docker hub. Docker 1.4.1,
> marathon 0.8.0-SNAPSHOT
> Reporter: Ian Babrou
> Labels: docker, executors, slave
>
> I tried to launch my dockerized app with 50 tasks on marathon and all tasks
> failed to run. Usually app works just fine.
> Backstory:
> https://github.com/mesosphere/marathon/issues/1083#issuecomment-71196704
> Marathon logs:
> [2015-01-23 13:22:30,163] INFO Starting app /topface/prod-test/app
> (mesosphere.marathon.SchedulerActions:363)
> [2015-01-23 13:22:30,165] INFO Already running 0 instances of
> /topface/prod-test/app. Not scaling.
> (mesosphere.marathon.SchedulerActions:512)
> [2015-01-23 13:22:35,339] INFO Received status update for task
> topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:22:35,367] INFO Task
> topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:22:35,368] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:22:35,369] INFO Task launch delay for [/topface/prod-test/app]
> is now [999483319 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:22:45,345] INFO Received status update for task
> topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:22:45,359] INFO Task
> topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:22:45,360] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:22:45,360] INFO Task launch delay for [/topface/prod-test/app]
> is now [999838313 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,942] INFO Received status update for task
> topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,946] INFO Task launch delay for [/topface/prod-test/app]
> is now [1149948119 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,946] INFO Task
> topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,946] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,955] INFO Received status update for task
> topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,957] INFO Task launch delay for [/topface/prod-test/app]
> is now [1321950877 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,958] INFO Task
> topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,958] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,958] INFO Received status update for task
> topface_prod-test_app.e2bb3906-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,960] INFO Task launch delay for [/topface/prod-test/app]
> is now [1519954162 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,960] INFO Task
> topface_prod-test_app.e2bb3906-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,961] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,961] INFO Received status update for task
> topface_prod-test_app.e2c30146-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,963] INFO Task launch delay for [/topface/prod-test/app]
> is now [1746973326 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,970] INFO Task
> topface_prod-test_app.e2c30146-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,970] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,970] INFO Received status update for task
> topface_prod-test_app.e2ba9cc2-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,973] INFO Task
> topface_prod-test_app.e2ba9cc2-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,973] INFO Task launch delay for [/topface/prod-test/app]
> is now [2008991202 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,973] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,973] INFO Received status update for task
> topface_prod-test_app.e2bc4a7c-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> [2015-01-23 13:23:31,975] INFO Task launch delay for [/topface/prod-test/app]
> is now [2309993195 nanoseconds] (mesosphere.util.RateLimiter:35)
> [2015-01-23 13:23:31,976] INFO Task
> topface_prod-test_app.e2bc4a7c-a302-11e4-bea0-56847afe9799 expunged and
> removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:107)
> [2015-01-23 13:23:31,976] INFO Sending event notification.
> (mesosphere.marathon.MarathonScheduler:262)
> [2015-01-23 13:23:31,976] INFO Received status update for task
> topface_prod-test_app.e2bb11f5-a302-11e4-bea0-56847afe9799: TASK_FAILED
> (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:148)
> First task failed to start because of network setup (docker registry was
> unavailable). Second task ended up on the same host and failed as well:
> E0123 13:22:35.287389 13 slave.cpp:2787] Container
> '0a1225ce-98bd-4f83-a417-b7cf72bb90e8' for executor
> 'topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed to start: Failed to 'docker
> pull docker.core.tf/topface-prod-app:20150123019': exit status = exited with
> status 1 stderr = time="2015-01-23T13:22:35Z" level="fatal" msg="Error:
> Invalid registry endpoint https://docker.core.tf/v1/: Get
> https://docker.core.tf/v1/_ping: dial tcp 10.5.1.194:443: connection timed
> out. If this private registry supports only HTTP or HTTPS with an unknown CA
> certificate, please add `--insecure-registry docker.core.tf` to the daemon's
> arguments. In the case of HTTPS, if you have access to the registry's CA
> certificate, no need for the flag; simply place the CA certificate at
> /etc/docker/certs.d/docker.core.tf/ca.crt"
> E0123 13:22:35.303208 13 slave.cpp:2882] Termination of executor
> 'topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
> 0a1225ce-98bd-4f83-a417-b7cf72bb90e8
> E0123 13:22:35.303503 6 slave.cpp:3134] Failed to unmonitor container for
> executor topface_prod-test_app.e2baeae4-a302-11e4-bea0-56847afe9799 of
> framework 20141003-172543-3892422848-5050-1-0000: Not monitored
> W0123 13:22:35.304908 11 docker.cpp:1184] Ignoring updating unknown
> container: 0a1225ce-98bd-4f83-a417-b7cf72bb90e8
> E0123 13:22:45.330379 12 slave.cpp:2787] Container
> '60a2fe62-4d64-4594-b1be-7e5795d6323c' for executor
> 'topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed to start: Failed to 'docker
> pull docker.core.tf/topface-prod-app:20150123019': exit status = exited with
> status 1 stderr = time="2015-01-23T13:22:45Z" level="fatal" msg="Error:
> Invalid registry endpoint https://docker.core.tf/v1/: Get
> https://docker.core.tf/v1/_ping: dial tcp 10.5.1.194:443: connection timed
> out. If this private registry supports only HTTP or HTTPS with an unknown CA
> certificate, please add `--insecure-registry docker.core.tf` to the daemon's
> arguments. In the case of HTTPS, if you have access to the registry's CA
> certificate, no need for the flag; simply place the CA certificate at
> /etc/docker/certs.d/docker.core.tf/ca.crt"
> E0123 13:22:45.330746 12 slave.cpp:2882] Termination of executor
> 'topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
> 60a2fe62-4d64-4594-b1be-7e5795d6323c
> E0123 13:22:45.340802 9 slave.cpp:3134] Failed to unmonitor container for
> executor topface_prod-test_app.e8f945c2-a302-11e4-bea0-56847afe9799 of
> framework 20141003-172543-3892422848-5050-1-0000: Not monitored
> W0123 13:22:45.342725 11 docker.cpp:1184] Ignoring updating unknown
> container: 60a2fe62-4d64-4594-b1be-7e5795d6323c
> Third task failed because of future discarded error:
> E0123 13:23:31.906733 12 slave.cpp:2787] Container
> 'bd0337a2-41f4-4308-85a9-68a3ff0475e6' for executor
> 'topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed to start: future discarded
> E0123 13:23:31.907039 12 slave.cpp:2882] Termination of executor
> 'topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
> bd0337a2-41f4-4308-85a9-68a3ff0475e6
> E0123 13:23:31.907260 7 slave.cpp:3134] Failed to unmonitor container for
> executor topface_prod-test_app.e2bcbfae-a302-11e4-bea0-56847afe9799 of
> framework 20141003-172543-3892422848-5050-1-0000: Not monitored
> Fourth task failed because of future discarded error too:
> E0123 13:23:31.932677 8 slave.cpp:2787] Container
> '782c163a-9238-4f3b-b9fd-dcc50579322a' for executor
> 'topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed to start: future discarded
> E0123 13:23:31.933078 8 slave.cpp:2882] Termination of executor
> 'topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799' of framework
> '20141003-172543-3892422848-5050-1-0000' failed: Unknown container:
> 782c163a-9238-4f3b-b9fd-dcc50579322a
> E0123 13:23:31.967974 6 slave.cpp:3134] Failed to unmonitor container for
> executor topface_prod-test_app.e2c460d9-a302-11e4-bea0-56847afe9799 of
> framework 20141003-172543-3892422848-5050-1-0000: Not monitored
> I think this "future discarded" thing should be fixed. Ideally more
> understandable error message should be introduced.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)