Mengkui gong created MESOS-5395:
-----------------------------------
Summary: Task getting stuck in staging state if launch it on a
rebooted slave.
Key: MESOS-5395
URL: https://issues.apache.org/jira/browse/MESOS-5395
Project: Mesos
Issue Type: Bug
Affects Versions: 0.28.0
Environment: mesos/marathon cluster, 3 maters/4 slaves
Mesos: 0.28.0 , Marathon 0.15.2
Reporter: Mengkui gong
if rebooting a slave, after that, using Marathon to launch a task, the task
can start on other slaves without problem. But if launch it on the rebooted
slave, the task will be stuck. From Mesos UI shows it in staging state from
active tasks list. From Marathon UI shows it in deploying state. It can
keeping in stuck state for more than 2 hours. After that time, Marathon will
automatically launch the task on this rebooted slave or other slave as normal.
So the rebooted slave be recovered as well after that time.
>From Mesos log, I can see "telling slave to kill task" all the time.
I0517 15:25:27.207237 20568 master.cpp:3826] Telling slave
282745ab-423a-4350-a449-3e8cdfccfb93-S1 at slave(1)@10.254.234.236:5050
(mesos-slave-3) to kill task
project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e of
framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 (marathon) at
[email protected]:56757.
>From rebooted slave log, I can see:
May 17 15:28:37 euca-10-254-234-236 mesos-slave[829]: I0517 15:28:37.206831
916 slave.cpp:1891] Asked to kill task
project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e of
framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000
May 17 15:28:37 euca-10-254-234-236 mesos-slave[829]: W0517 15:28:37.206866
916 slave.cpp:2018] Ignoring kill task
project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e because
the executor
'project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e' of
framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 is terminating/terminated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)