[ 
https://issues.apache.org/jira/browse/MESOS-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351803#comment-15351803
 ] 

Vinod Kone commented on MESOS-5718:
-----------------------------------

Looks like when the kill task request was sent at 14:31 the executor was in the 
process of termination. Do you know why the executor was terminating? But from 
the looks of it the executor never completely terminated (was it hung?) until 
after the agent was restarted 15 mins later.

> Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos 
> Agent.
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-5718
>                 URL: https://issues.apache.org/jira/browse/MESOS-5718
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: chenqiang
>
> Now, we find an issue that a task launched by marathon with docker container 
> shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos 
> Agent host. Namely, the docker container doesn't exist but the Task is shown 
> As RUNNING in Mesos UI.  so interesting...
> Parts log is attached as belows:
> ```
> I0627 14:31:30.239467  3913 slave.cpp:1912] Asked to kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of 
> framework 20141201-145651-1900714250-5050-3484-0000
> W0627 14:31:30.239547  3913 slave.cpp:2025] Ignoring kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 
> because the executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578 is terminating/terminated
> I0624 14:46:04.398646  3921 slave.cpp:4511] Sending reconnect request to 
> executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399073  3899 slave.cpp:2991] Killing un-reregistered executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399183  3899 slave.cpp:4571] Finished recovery
> I0624 14:46:06.399375  3902 docker.cpp:1724] Destroying container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> I0624 14:46:06.399431  3902 docker.cpp:1852] Running docker stop on container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> ``` 
> What's the root cause ? It seems executor of that task is terminated, but the 
> task is ignored kill by slave.
> FIX: After restart mesos-slave, the RUNNING task becomes  in FAILED status, 
> and we can see it is launched again in other Agent, the task restores to 
> normal...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to