[
https://issues.apache.org/jira/browse/AURORA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013097#comment-14013097
]
Bill Farner commented on AURORA-493:
------------------------------------
[~drobinson] any chance this might be your first contribution? :-)
> expose accurate metrics of state transitions
> --------------------------------------------
>
> Key: AURORA-493
> URL: https://issues.apache.org/jira/browse/AURORA-493
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Reporter: David Robinson
> Priority: Minor
>
> The task store metrics (task_store_*) exposed via http://localhost:8081/vars
> aren't accurate enough to be use for alerting purposes. At first glance the
> task_store_* metrics look like they could be used to alert on LOST tasks
> (task_store_LOST) increasing (among other things), but the numbers actually
> decrease as tasks are pruned. If a task becomes lost task_store_LOST is
> incremented, but it's also decremented as lost tasks are pruned, therefore if
> both increment and decrement occur within an alerting system's polling
> interval then the lost task(s) will not be captured.
> Consider adding counters of task state transitions that aren't touched when
> tasks are pruned -- they should show the entire number of tasks that have
> transitioned through, or terminated in each state.
--
This message was sent by Atlassian JIRA
(v6.2#6252)