[
https://issues.apache.org/jira/browse/MESOS-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jack Crawford updated MESOS-7771:
---------------------------------
Description:
I run a cluster of 50-100 machines with a single mesos master. Occasionally,
the master will get into a state where it will continually forward status
updates for the same tasks until all work on the cluster grinds to a halt.
Checking the logs, i see numerous `Forwarding status update TASK_FINISHED ...`
messages for the same task id. Additionally, the "Active Tasks" section of the
web ui gradually begins to entirely list tasks in finished/failed states,
instead of staging/running
Im not sure what causes this, but it appears to occur rarely when the cluster
is under heavy scheduling load (100s to 1000s of jobs scheduled in <1min
sometimes)
Restarting the mesos master fixes the problem for the next week or two
was:
I run a cluster of 50-100 machines with a single mesos master. Occasionally,
the master will get into a state where it will continually forward status
updates for the same tasks until all work on the cluster grinds to a halt.
Checking the logs, i see numerous `Forwarding status update TASK_FINISHED ...`
messages for the same task id. Additionally, the "Active Tasks" section of the
web ui gradually begins to entirely list tasks in finished/failed states,
instead of staging/running
Im not sure what causes this, but it appears to occur rarely when the cluster
is under heavy scheduling load (100s to 1000s of jobs scheduled in <1min
sometimes)
> mesos master continually forwards status updates for finished tasks
> -------------------------------------------------------------------
>
> Key: MESOS-7771
> URL: https://issues.apache.org/jira/browse/MESOS-7771
> Project: Mesos
> Issue Type: Bug
> Reporter: Jack Crawford
>
> I run a cluster of 50-100 machines with a single mesos master. Occasionally,
> the master will get into a state where it will continually forward status
> updates for the same tasks until all work on the cluster grinds to a halt.
> Checking the logs, i see numerous `Forwarding status update TASK_FINISHED
> ...` messages for the same task id. Additionally, the "Active Tasks" section
> of the web ui gradually begins to entirely list tasks in finished/failed
> states, instead of staging/running
> Im not sure what causes this, but it appears to occur rarely when the cluster
> is under heavy scheduling load (100s to 1000s of jobs scheduled in <1min
> sometimes)
> Restarting the mesos master fixes the problem for the next week or two
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)