[
https://issues.apache.org/jira/browse/MESOS-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308926#comment-15308926
]
Gilbert Song commented on MESOS-5395:
-------------------------------------
[~Mengkui], Thanks for reporting this issue. Could you reproduce this issue and
see whether restarting the slave process resolve the issue?
BTW, could you verify https://issues.apache.org/jira/browse/MESOS-5482 is
identical to this issue? Thanks. :)
> Task getting stuck in staging state if launch it on a rebooted slave.
> ---------------------------------------------------------------------
>
> Key: MESOS-5395
> URL: https://issues.apache.org/jira/browse/MESOS-5395
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.28.0
> Environment: mesos/marathon cluster, 3 maters/4 slaves
> Mesos: 0.28.0 , Marathon 0.15.2
> Reporter: Mengkui gong
> Attachments: mesos-log.zip
>
>
> if rebooting a slave, after that, using Marathon to launch a task, the task
> can start on other slaves without problem. But if launch it on the rebooted
> slave, the task will be stuck. From Mesos UI shows it in staging state from
> active tasks list. From Marathon UI shows it in deploying state. It can
> keeping in stuck state for more than 2 hours. After that time, Marathon will
> automatically launch the task on this rebooted slave or other slave as
> normal. So the rebooted slave be recovered as well after that time.
> From Mesos log, I can see "telling slave to kill task" all the time.
> I0517 15:25:27.207237 20568 master.cpp:3826] Telling slave
> 282745ab-423a-4350-a449-3e8cdfccfb93-S1 at slave(1)@10.254.234.236:5050
> (mesos-slave-3) to kill task
> project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e of
> framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 (marathon) at
> [email protected]:56757.
> From rebooted slave log, I can see:
> May 17 15:28:37 euca-10-254-234-236 mesos-slave[829]: I0517 15:28:37.206831
> 916 slave.cpp:1891] Asked to kill task
> project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e of
> framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000
> May 17 15:28:37 euca-10-254-234-236 mesos-slave[829]: W0517 15:28:37.206866
> 916 slave.cpp:2018] Ignoring kill task
> project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e because
> the executor
> 'project-hub_project-hub-frontend.b645f24b-1c1f-11e6-bb25-d00d2cce797e' of
> framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 is terminating/terminated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)