[
https://issues.apache.org/jira/browse/MESOS-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352427#comment-15352427
]
chenqiang edited comment on MESOS-5718 at 6/28/16 6:17 AM:
-----------------------------------------------------------
yes, it was hung in executor terminated. we upgraded mesos agent to 0.28.2,
after upgrading and starting mesos-slave.service, the running executors with
old version would recover when mesos agent registered again.
was (Author: chenqiang):
yes, it was hung in executor terminated.
> Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos
> Agent.
> --------------------------------------------------------------------------------
>
> Key: MESOS-5718
> URL: https://issues.apache.org/jira/browse/MESOS-5718
> Project: Mesos
> Issue Type: Bug
> Reporter: chenqiang
>
> Now, we find an issue that a task launched by marathon with docker container
> shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos
> Agent host. Namely, the docker container doesn't exist but the Task is shown
> As RUNNING in Mesos UI. so interesting...
> Parts log is attached as belows:
> ```
> I0627 14:31:30.239467 3913 slave.cpp:1912] Asked to kill task
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of
> framework 20141201-145651-1900714250-5050-3484-0000
> W0627 14:31:30.239547 3913 slave.cpp:2025] Ignoring kill task
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799
> because the executor
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799'
> of framework 20141201-145651-1900714250-5050-3484-0000 at
> executor(1)@10.153.96.22:14578 is terminating/terminated
> I0624 14:46:04.398646 3921 slave.cpp:4511] Sending reconnect request to
> executor
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799'
> of framework 20141201-145651-1900714250-5050-3484-0000 at
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399073 3899 slave.cpp:2991] Killing un-reregistered executor
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799'
> of framework 20141201-145651-1900714250-5050-3484-0000 at
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399183 3899 slave.cpp:4571] Finished recovery
> I0624 14:46:06.399375 3902 docker.cpp:1724] Destroying container
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> I0624 14:46:06.399431 3902 docker.cpp:1852] Running docker stop on container
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> ```
> What's the root cause ? It seems executor of that task is terminated, but the
> task is ignored kill by slave.
> FIX: After restart mesos-slave, the RUNNING task becomes in FAILED status,
> and we can see it is launched again in other Agent, the task restores to
> normal...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)