[ 
https://issues.apache.org/jira/browse/AURORA-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013097#comment-14013097
 ] 

Bill Farner commented on AURORA-493:
------------------------------------

[~drobinson] any chance this might be your first contribution? :-)

> expose accurate metrics of state transitions
> --------------------------------------------
>
>                 Key: AURORA-493
>                 URL: https://issues.apache.org/jira/browse/AURORA-493
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: David Robinson
>            Priority: Minor
>
> The task store metrics (task_store_*) exposed via http://localhost:8081/vars 
> aren't accurate enough to be use for alerting purposes. At first glance the 
> task_store_* metrics look like they could be used to alert on LOST tasks 
> (task_store_LOST) increasing (among other things), but the numbers actually 
> decrease as tasks are pruned. If a task becomes lost task_store_LOST is 
> incremented, but it's also decremented as lost tasks are pruned, therefore if 
> both increment and decrement occur within an alerting system's polling 
> interval then the lost task(s) will not be captured.
> Consider adding counters of task state transitions that aren't touched when 
> tasks are pruned -- they should show the entire number of tasks that have 
> transitioned through, or terminated in each state.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to