[
https://issues.apache.org/jira/browse/SLIDER-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955365#comment-14955365
]
Steve Loughran commented on SLIDER-870:
---------------------------------------
Without a stable client library for ATS, I don't want to go near this
> use timeline server as a historical source of failure information
> -----------------------------------------------------------------
>
> Key: SLIDER-870
> URL: https://issues.apache.org/jira/browse/SLIDER-870
> Project: Slider
> Issue Type: Sub-task
> Components: appmaster, client
> Affects Versions: Slider 0.80
> Reporter: Steve Loughran
> Fix For: Slider 1.0.0
>
>
> We lose failure history when an AM dies; this hurts reporting and doesn't
> allow the collection of long-term statistics.
> We can use the timeline server for this information, saving events on
> failure, then querying it on AM restart to rebuild that history & re-use it
> in decision making.
> They can also be presented to the user in (a) the web UI and (b) from the
> command line —even while a cluster is not running.
> Finally, stats on node failures could be aggregated across applications,
> possibly even across users. This would identify hotspots for node
> unreliability.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)