[jira] [Updated] (MESOS-5718) Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos Agent.

chenqiang (JIRA) Mon, 27 Jun 2016 01:40:47 -0700

     [ 
https://issues.apache.org/jira/browse/MESOS-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


chenqiang updated MESOS-5718:
-----------------------------
    Description: 
Now, we find an issue that a task launched by marathon with docker container 
shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos 
Agent host. Namely, the docker container doesn't exist but the Task is shown As 
RUNNING in Mesos UI.  so interesting...


Parts log is attached as belows:

```
I0627 14:31:30.239467  3913 slave.cpp:1912] Asked to kill task 
tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of 
framework 20141201-145651-1900714250-5050-3484-0000
W0627 14:31:30.239547  3913 slave.cpp:2025] Ignoring kill task 
tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 
because the executor 
'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' of 
framework 20141201-145651-1900714250-5050-3484-0000 at 
executor(1)@10.153.96.22:14578 is terminating/terminated


I0624 14:46:04.398646  3921 slave.cpp:4511] Sending reconnect request to 
executor 
'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' of 
framework 20141201-145651-1900714250-5050-3484-0000 at 
executor(1)@10.153.96.22:14578

I0624 14:46:06.399073  3899 slave.cpp:2991] Killing un-reregistered executor 
'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' of 
framework 20141201-145651-1900714250-5050-3484-0000 at 
executor(1)@10.153.96.22:14578
I0624 14:46:06.399183  3899 slave.cpp:4571] Finished recovery
I0624 14:46:06.399375  3902 docker.cpp:1724] Destroying container 
'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
I0624 14:46:06.399431  3902 docker.cpp:1852] Running docker stop on container 
'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'

``` 

What's the root cause ? It seems executor of that task is terminated, but the 
task is ignored kill by slave.


FIX: After restart mesos-slave, the RUNNING task becomes  in FAILED status, and 
we can see it is launched again in other Agent, the task restores to normal...


  was:
Now, we find an issue that a task launched by marathon with docker container 
shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos 
Agent host. Namely, the docker container doesn't exist but the Task is shown As 
RUNNING in Mesos UI.  so interesting...


Parts log is attached as belows:

```
I0627 14:31:30.239467  3913 slave.cpp:1912] Asked to kill task 
tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of 
framework 20141201-145651-1900714250-5050-3484-0000
W0627 14:31:30.239547  3913 slave.cpp:2025] Ignoring kill task 
tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 
because the executor 
'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' of 
framework 20141201-145651-1900714250-5050-3484-0000 at 
executor(1)@10.153.96.22:14578 is terminating/terminated


I0624 14:46:04.398646  3921 slave.cpp:4511] Sending reconnect request to 
executor 
'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' of 
framework 20141201-145651-1900714250-5050-3484-0000 at 
executor(1)@10.153.96.22:14578

I0624 14:46:06.399073  3899 slave.cpp:2991] Killing un-reregistered executor 
'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' of 
framework 20141201-145651-1900714250-5050-3484-0000 at 
executor(1)@10.153.96.22:14578
I0624 14:46:06.399183  3899 slave.cpp:4571] Finished recovery
I0624 14:46:06.399375  3902 docker.cpp:1724] Destroying container 
'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
I0624 14:46:06.399431  3902 docker.cpp:1852] Running docker stop on container 
'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'

``` 

What's the root cause ?


FIX: After restart mesos-slave, the RUNNING task becomes  in FAILED status, and 
we can see it is launched again in other Agent, the task restores to normal...



> Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos 
> Agent.
> --------------------------------------------------------------------------------
>
>                 Key: MESOS-5718
>                 URL: https://issues.apache.org/jira/browse/MESOS-5718
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: chenqiang
>            Assignee: chenqiang
>
> Now, we find an issue that a task launched by marathon with docker container 
> shows "Task is in RUNNING status" in Mesos UI, but can't find it in the mesos 
> Agent host. Namely, the docker container doesn't exist but the Task is shown 
> As RUNNING in Mesos UI.  so interesting...
> Parts log is attached as belows:
> ```
> I0627 14:31:30.239467  3913 slave.cpp:1912] Asked to kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 of 
> framework 20141201-145651-1900714250-5050-3484-0000
> W0627 14:31:30.239547  3913 slave.cpp:2025] Ignoring kill task 
> tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799 
> because the executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578 is terminating/terminated
> I0624 14:46:04.398646  3921 slave.cpp:4511] Sending reconnect request to 
> executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399073  3899 slave.cpp:2991] Killing un-reregistered executor 
> 'tanmenggang.router-web.jylt-online02.532b8817-391f-11e6-93b3-56847afe9799' 
> of framework 20141201-145651-1900714250-5050-3484-0000 at 
> executor(1)@10.153.96.22:14578
> I0624 14:46:06.399183  3899 slave.cpp:4571] Finished recovery
> I0624 14:46:06.399375  3902 docker.cpp:1724] Destroying container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> I0624 14:46:06.399431  3902 docker.cpp:1852] Running docker stop on container 
> 'fa37fc7c-7ef1-478a-81a2-cae38ab3e4cb'
> ``` 
> What's the root cause ? It seems executor of that task is terminated, but the 
> task is ignored kill by slave.
> FIX: After restart mesos-slave, the RUNNING task becomes  in FAILED status, 
> and we can see it is launched again in other Agent, the task restores to 
> normal...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-5718) Mesos UI shows "Taks is in RUNNING status" but can't find it in the mesos Agent.

Reply via email to