[ 
https://issues.apache.org/jira/browse/MESOS-5482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121944#comment-16121944
 ] 

Mao Geng commented on MESOS-5482:
---------------------------------

Hit this issue on mesos 1.2.0 and marathon 1.4.3 too. 
The agent timed out the ping for 75secs, then reconnected
{quote}
I0810 13:18:43.142431 18394 slave.cpp:4378] No pings from master received 
within 75secs
I0810 13:18:43.142588 18393 slave.cpp:920] Re-detecting master
I0810 13:18:43.142614 18393 slave.cpp:966] Detecting new master
I0810 13:18:43.142674 18407 status_update_manager.cpp:177] Pausing sending 
status updates
I0810 13:18:43.142755 18420 status_update_manager.cpp:177] Pausing sending 
status updates
I0810 13:18:43.142813 18415 slave.cpp:931] New master detected at 
master@10.1.36.4:5050
I0810 13:18:43.142840 18415 slave.cpp:955] No credentials provided. Attempting 
to register without authentication
I0810 13:18:43.142853 18415 slave.cpp:966] Detecting new master
I0810 13:18:44.431833 18415 slave.cpp:1242] Re-registered with master 
master@10.1.36.4:5050
I0810 13:18:44.431874 18415 slave.cpp:1279] Forwarding total oversubscribed 
resources {}
I0810 13:18:44.431895 18398 status_update_manager.cpp:184] Resuming sending 
status updates
I0810 13:18:44.433912 18386 slave.cpp:2683] Shutting down framework 
f853458f-b07b-4b79-8192-24953f474369-0000
I0810 13:18:44.433939 18386 slave.cpp:5083] Shutting down executor 
'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework 
f853458f-b07b-4b79-8192-24953f474369-0000 at executor(1)@10.1.98.251:33041
W0810 13:18:44.435637 18440 slave.cpp:2823] Ignoring updating pid for framework 
f853458f-b07b-4b79-8192-24953f474369-0000 because it is terminating
I0810 13:18:46.878993 18408 slave.cpp:1625] Got assigned task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' for framework 
f853458f-b07b-4b79-8192-24953f474369-0000
I0810 13:18:46.879406 18408 slave.cpp:1785] Launching task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' for framework 
f853458f-b07b-4b79-8192-24953f474369-0000
W0810 13:18:46.879436 18408 slave.cpp:1853] Ignoring running task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' of framework 
f853458f-b07b-4b79-8192-24953f474369-0000 because the framework is terminating
I0810 13:18:47.613224 18415 slave.cpp:3816] Handling status update TASK_KILLED 
(UUID: af78fc5c-8552-4aee-abae-cda3d0ec2909) for task 
metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5 of framework 
f853458f-b07b-4b79-8192-24953f474369-0000 from executor(1)@10.1.98.251:33041
W0810 13:18:47.613261 18415 slave.cpp:3885] Ignoring status update TASK_KILLED 
(UUID: af78fc5c-8552-4aee-abae-cda3d0ec2909) for task 
metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5 of framework 
f853458f-b07b-4b79-8192-24953f474369-0000 for terminating framework 
f853458f-b07b-4b79-8192-24953f474369-0000
I0810 13:18:48.618629 18409 slave.cpp:4388] Got exited event for 
executor(1)@10.1.98.251:33041
I0810 13:18:48.713826 18390 docker.cpp:2358] Executor for container 
1f351db2-1011-4244-83c2-1854c44d7b65 has exited
I0810 13:18:48.713850 18390 docker.cpp:2052] Destroying container 
1f351db2-1011-4244-83c2-1854c44d7b65
I0810 13:18:48.713892 18390 docker.cpp:2179] Running docker stop on container 
1f351db2-1011-4244-83c2-1854c44d7b65
I0810 13:18:48.714363 18411 slave.cpp:4769] Executor 
'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework 
f853458f-b07b-4b79-8192-24953f474369-0000 exited with status 0
I0810 13:18:48.714390 18411 slave.cpp:4869] Cleaning up executor 
'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5' of framework 
f853458f-b07b-4b79-8192-24953f474369-0000 at executor(1)@10.1.98.251:33041
I0810 13:18:48.714589 18411 slave.cpp:4957] Cleaning up framework 
f853458f-b07b-4b79-8192-24953f474369-0000
I0810 13:18:48.714607 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-0000/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5/runs/1f351db2-1011-4244-83c2-1854c44d7b65'
 for gc 6.99999173026667days in the future
I0810 13:18:48.714669 18410 status_update_manager.cpp:285] Closing status 
update streams for framework f853458f-b07b-4b79-8192-24953f474369-0000
I0810 13:18:48.714679 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-0000/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5'
 for gc 6.99999172979259days in the future
I0810 13:18:48.714709 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/meta/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-0000/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5/runs/1f351db2-1011-4244-83c2-1854c44d7b65'
 for gc 6.99999172953778days in the future
I0810 13:18:48.714725 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/meta/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-0000/executors/metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5'
 for gc 6.99999172937481days in the future
I0810 13:18:48.714740 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-0000'
 for gc 6.99999172839111days in the future
I0810 13:18:48.714752 18432 gc.cpp:55] Scheduling 
'/mnt/mesos/meta/slaves/508bde0b-4661-4a29-b674-32163345096f-S229/frameworks/f853458f-b07b-4b79-8192-24953f474369-0000'
 for gc 6.9999917281837days in the future
I0810 13:18:49.435122 18446 slave.cpp:5124] Framework 
f853458f-b07b-4b79-8192-24953f474369-0000 seems to have exited. Ignoring 
shutdown timeout for executor 
'metrics_statsd.2e578bc8-7bac-11e7-9ea1-0242c1e4f2c5'
I0810 13:23:48.444691 18439 slave.cpp:2428] Asked to kill task 
metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac of framework 
f853458f-b07b-4b79-8192-24953f474369-0000
W0810 13:23:48.444766 18439 slave.cpp:2448] Ignoring kill task 
metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac of framework 
f853458f-b07b-4b79-8192-24953f474369-0000 because no such framework is running
....
{quote}

These messages are suspicious
{quote}
I0810 13:18:46.878993 18408 slave.cpp:1625] Got assigned task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' for framework 
f853458f-b07b-4b79-8192-24953f474369-0000
I0810 13:18:46.879406 18408 slave.cpp:1785] Launching task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' for framework 
f853458f-b07b-4b79-8192-24953f474369-0000
W0810 13:18:46.879436 18408 slave.cpp:1853] Ignoring running task 
'metrics_statsd.70dff634-7dce-11e7-bea2-0242f4eb80ac' of framework 
f853458f-b07b-4b79-8192-24953f474369-0000 because the framework is terminating
{quote}
Seems this caused agent stuck in a "Asked to kill"/"Ignoring kill" loop 



> mesos/marathon task stuck in staging after slave reboot
> -------------------------------------------------------
>
>                 Key: MESOS-5482
>                 URL: https://issues.apache.org/jira/browse/MESOS-5482
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: lutful karim
>         Attachments: marathon-mesos-masters_after-reboot.log, 
> mesos-masters_mesos.log, mesos_slaves_after_reboot.log, 
> tasks_running_before_rebooot.marathon
>
>
> The main idea of mesos/marathon is to sleep well, but after node reboot mesos 
> task gets stuck in staging for about 4 hours.
> To reproduce the issue: 
> - setup a mesos cluster in HA mode with systemd enabled mesos-master and 
> mesos-slave service.
> - run docker registry (https://hub.docker.com/_/registry/ ) with mesos 
> constraint (hostname:LIKE:mesos-slave-1) in one node. Reboot the node and 
> notice that task getting stuck in staging.
> Possible workaround: service mesos-slave restart fixes the issue.
> OS: centos 7.2
> mesos version: 0.28.1
> marathon: 1.1.1
> zookeeper: 3.4.8
> docker: 1.9.1 dockerAPIversion: 1.21
> error message:
> May 30 08:38:24 euca-10-254-237-140 mesos-slave[832]: W0530 08:38:24.120013   
> 909 slave.cpp:2018] Ignoring kill task 
> docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3 because the executor 
> 'docker-registry.066fb448-2628-11e6-bedd-d00d0ef81dc3' of framework 
> 8517fcb7-f2d0-47ad-ae02-837570bef929-0000 is terminating/terminated



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to