[ 
https://issues.apache.org/jira/browse/MESOS-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175210#comment-14175210
 ] 

Dominic Hamon commented on MESOS-1830:
--------------------------------------

Once the messages are being sent we should consider updating the statistic 
keys. We currently export:

*master*
{noformat}
master/tasks_failed: 23720,
master/tasks_finished: 41025,
master/tasks_killed: 4,
master/tasks_lost: 35,
master/tasks_running: 185,
master/tasks_staging: 15,
master/tasks_starting: 0,
{noformat}

*slave*
{noformat}
slave/tasks_failed: 93,
slave/tasks_finished: 126,
slave/tasks_killed: 0,
slave/tasks_lost: 0,
slave/tasks_running: 1,
slave/tasks_staging: 0,
slave/tasks_starting: 0,
{noformat}

after this change, we should consider adding the source. i don't think we 
should add the reason to the statistics. For the master, there is only one 
place ({{Master::updateTask}}) where we have updates that do not have source 
{{SOURCE_MASTER}} and we currently don't update {{metrics}} for these status 
updates. Note, we do update the old {{stats}}, so this may be an oversight.

Similarly, the slave updates task statuses in {{Slave::statusUpdate}} and again 
we update {{stats}} but not {{metrics}} for these updates.

If we do include the source, which is preferable?

*source/task*
- {{master/source_master/tasks_failed}}
- {{slave/source_executor/tasks_lost}}

*task/source*
- {{master/tasks_failed/source_master}}
- {{slave/tasks_lost/source_executor}}


> Expose master stats differentiating between master-generated and 
> slave-generated LOST tasks
> -------------------------------------------------------------------------------------------
>
>                 Key: MESOS-1830
>                 URL: https://issues.apache.org/jira/browse/MESOS-1830
>             Project: Mesos
>          Issue Type: Story
>          Components: master
>            Reporter: Bill Farner
>            Assignee: Dominic Hamon
>            Priority: Minor
>
> The master exports a monotonically-increasing counter of tasks transitioned 
> to TASK_LOST.  This loses fidelity of the source of the lost task.  A first 
> step in exposing the source of lost tasks might be to just differentiate 
> between TASK_LOST transitions initiated by the master vs the slave (and maybe 
> bad input from the scheduler).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to