[ 
https://issues.apache.org/jira/browse/SLIDER-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955365#comment-14955365
 ] 

Steve Loughran commented on SLIDER-870:
---------------------------------------

Without a stable client library for ATS, I don't want to go near this

> use timeline server as a historical source of failure information
> -----------------------------------------------------------------
>
>                 Key: SLIDER-870
>                 URL: https://issues.apache.org/jira/browse/SLIDER-870
>             Project: Slider
>          Issue Type: Sub-task
>          Components: appmaster, client
>    Affects Versions: Slider 0.80
>            Reporter: Steve Loughran
>             Fix For: Slider 1.0.0
>
>
> We lose failure history when an AM dies; this hurts reporting and doesn't 
> allow the collection of long-term statistics.
> We can use the timeline server for this information, saving events on 
> failure, then querying it on AM restart to rebuild that history & re-use it 
> in decision making. 
> They can also be presented to the user in (a) the web UI and (b) from the 
> command line —even while a cluster is not running.
> Finally, stats on node failures could be aggregated across applications, 
> possibly even across users. This would identify hotspots for node 
> unreliability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to