[
https://issues.apache.org/jira/browse/MESOS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175210#comment-14175210
]
Dominic Hamon commented on MESOS-1830:
--------------------------------------
Once the messages are being sent we should consider updating the statistic
keys. We currently export:
*master*
{noformat}
master/tasks_failed: 23720,
master/tasks_finished: 41025,
master/tasks_killed: 4,
master/tasks_lost: 35,
master/tasks_running: 185,
master/tasks_staging: 15,
master/tasks_starting: 0,
{noformat}
*slave*
{noformat}
slave/tasks_failed: 93,
slave/tasks_finished: 126,
slave/tasks_killed: 0,
slave/tasks_lost: 0,
slave/tasks_running: 1,
slave/tasks_staging: 0,
slave/tasks_starting: 0,
{noformat}
after this change, we should consider adding the source. i don't think we
should add the reason to the statistics. For the master, there is only one
place ({{Master::updateTask}}) where we have updates that do not have source
{{SOURCE_MASTER}} and we currently don't update {{metrics}} for these status
updates. Note, we do update the old {{stats}}, so this may be an oversight.
Similarly, the slave updates task statuses in {{Slave::statusUpdate}} and again
we update {{stats}} but not {{metrics}} for these updates.
If we do include the source, which is preferable?
*source/task*
- {{master/source_master/tasks_failed}}
- {{slave/source_executor/tasks_lost}}
*task/source*
- {{master/tasks_failed/source_master}}
- {{slave/tasks_lost/source_executor}}
> Expose master stats differentiating between master-generated and
> slave-generated LOST tasks
> -------------------------------------------------------------------------------------------
>
> Key: MESOS-1830
> URL: https://issues.apache.org/jira/browse/MESOS-1830
> Project: Mesos
> Issue Type: Story
> Components: master
> Reporter: Bill Farner
> Assignee: Dominic Hamon
> Priority: Minor
>
> The master exports a monotonically-increasing counter of tasks transitioned
> to TASK_LOST. This loses fidelity of the source of the lost task. A first
> step in exposing the source of lost tasks might be to just differentiate
> between TASK_LOST transitions initiated by the master vs the slave (and maybe
> bad input from the scheduler).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)