[ 
https://issues.apache.org/jira/browse/MESOS-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15352427#comment-15352427
 ] 

chenqiang edited comment on MESOS-5718 at 6/28/16 6:17 AM:
-----------------------------------------------------------

yes, it was hung in executor terminated.  we upgraded mesos agent to 0.28.2, 
after upgrading and starting mesos-slave.service, the running executors with 
old version would recover when mesos agent registered again. 


was (Author: chenqiang):
yes, it was hung in executor terminated. 

> Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos 
> Agent.
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-5718
>                 URL: https://issues.apache.org/jira/browse/MESOS-5718
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: chenqiang
>
> Now, we find an issue that a task launched by marathon with docker container 
> shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos 
> Agent host. Namely, the docker container doesn't exist but the Task is shown 
> As RUNNING in Mesos UI.  so interesting...
> Parts log is attached as belows:
> ```
> I0627 14:31:30.239467  3913 slave.cpp:1912] Asked to kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of 
> framework 20141201-145651-1900714250-5050-3484-0000
> W0627 14:31:30.239547  3913 slave.cpp:2025] Ignoring kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 
> because the executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578 is terminating/terminated
> I0624 14:46:04.398646  3921 slave.cpp:4511] Sending reconnect request to 
> executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399073  3899 slave.cpp:2991] Killing un-reregistered executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399183  3899 slave.cpp:4571] Finished recovery
> I0624 14:46:06.399375  3902 docker.cpp:1724] Destroying container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> I0624 14:46:06.399431  3902 docker.cpp:1852] Running docker stop on container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> ``` 
> What's the root cause ? It seems executor of that task is terminated, but the 
> task is ignored kill by slave.
> FIX: After restart mesos-slave, the RUNNING task becomes  in FAILED status, 
> and we can see it is launched again in other Agent, the task restores to 
> normal...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to