[
https://issues.apache.org/jira/browse/MESOS-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351803#comment-15351803
]
Vinod Kone commented on MESOS-5718:
-----------------------------------
Looks like when the kill task request was sent at 14:31 the executor was in the
process of termination. Do you know why the executor was terminating? But from
the looks of it the executor never completely terminated (was it hung?) until
after the agent was restarted 15 mins later.
> Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos
> Agent.
> --------------------------------------------------------------------------------
>
> Key: MESOS-5718
> URL: https://issues.apache.org/jira/browse/MESOS-5718
> Project: Mesos
> Issue Type: Bug
> Reporter: chenqiang
>
> Now, we find an issue that a task launched by marathon with docker container
> shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos
> Agent host. Namely, the docker container doesn't exist but the Task is shown
> As RUNNING in Mesos UI. so interesting...
> Parts log is attached as belows:
> ```
> I0627 14:31:30.239467 3913 slave.cpp:1912] Asked to kill task
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of
> framework 20141201-145651-1900714250-5050-3484-0000
> W0627 14:31:30.239547 3913 slave.cpp:2025] Ignoring kill task
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799
> because the executor
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799'
> of framework 20141201-145651-1900714250-5050-3484-0000 at
> executor(1)@10.153.96.22:14578 is terminating/terminated
> I0624 14:46:04.398646 3921 slave.cpp:4511] Sending reconnect request to
> executor
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799'
> of framework 20141201-145651-1900714250-5050-3484-0000 at
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399073 3899 slave.cpp:2991] Killing un-reregistered executor
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799'
> of framework 20141201-145651-1900714250-5050-3484-0000 at
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399183 3899 slave.cpp:4571] Finished recovery
> I0624 14:46:06.399375 3902 docker.cpp:1724] Destroying container
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> I0624 14:46:06.399431 3902 docker.cpp:1852] Running docker stop on container
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> ```
> What's the root cause ? It seems executor of that task is terminated, but the
> task is ignored kill by slave.
> FIX: After restart mesos-slave, the RUNNING task becomes in FAILED status,
> and we can see it is launched again in other Agent, the task restores to
> normal...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)