[ 
https://issues.apache.org/jira/browse/MESOS-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jack Crawford updated MESOS-7771:
---------------------------------
    Description: 
I run a cluster of 50-100 machines with a single mesos master. Occasionally, 
the master will get into a state where it will continually forward status 
updates for the same tasks until all work on the cluster grinds to a halt.

Checking the logs, i see numerous `Forwarding status update TASK_FINISHED ...` 
messages for the same task id. Additionally, the "Active Tasks" section of the 
web ui gradually begins to entirely list tasks in finished/failed states, 
instead of staging/running

Im not sure what causes this, but it appears to occur rarely when the cluster 
is under heavy scheduling load (100s to 1000s of jobs scheduled in <1min 
sometimes)

Restarting the mesos master fixes the problem for the next week or two

  was:
I run a cluster of 50-100 machines with a single mesos master. Occasionally, 
the master will get into a state where it will continually forward status 
updates for the same tasks until all work on the cluster grinds to a halt.

Checking the logs, i see numerous `Forwarding status update TASK_FINISHED ...` 
messages for the same task id. Additionally, the "Active Tasks" section of the 
web ui gradually begins to entirely list tasks in finished/failed states, 
instead of staging/running

Im not sure what causes this, but it appears to occur rarely when the cluster 
is under heavy scheduling load (100s to 1000s of jobs scheduled in <1min 
sometimes)


> mesos master continually forwards status updates for finished tasks
> -------------------------------------------------------------------
>
>                 Key: MESOS-7771
>                 URL: https://issues.apache.org/jira/browse/MESOS-7771
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Jack Crawford
>
> I run a cluster of 50-100 machines with a single mesos master. Occasionally, 
> the master will get into a state where it will continually forward status 
> updates for the same tasks until all work on the cluster grinds to a halt.
> Checking the logs, i see numerous `Forwarding status update TASK_FINISHED 
> ...` messages for the same task id. Additionally, the "Active Tasks" section 
> of the web ui gradually begins to entirely list tasks in finished/failed 
> states, instead of staging/running
> Im not sure what causes this, but it appears to occur rarely when the cluster 
> is under heavy scheduling load (100s to 1000s of jobs scheduled in <1min 
> sometimes)
> Restarting the mesos master fixes the problem for the next week or two



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to