[
https://issues.apache.org/jira/browse/MESOS-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911920#comment-13911920
]
Benjamin Mahler commented on MESOS-1033:
----------------------------------------
We really need to make an investment in building (or using an existing) out our
Statistics abstraction in libprocess for making it trivial to add statistics
and to expose statistics in an industry standard way, so that existing
monitoring systems can consume the data easily. Tim had filed MESOS-780 to
express this desire for inter-operability, I've just filed MESOS-1036 for what
we'll need to make it trivial to expose a statistic from any part of the code.
> Create a stat for executors timing out on registration
> ------------------------------------------------------
>
> Key: MESOS-1033
> URL: https://issues.apache.org/jira/browse/MESOS-1033
> Project: Mesos
> Issue Type: Improvement
> Components: statistics
> Reporter: Vinod Kone
> Fix For: 0.19.0
>
>
> At Twitter we have seen cases where a slave host went in to a bad state
> (possible kernel bug) resulting in isolator/containerizer being blocked
> resulting in executors not being able to be launched.
> It would be nice to have a stat to expose the number of executors that are
> being killed due to registration timeout to alert on this behavior.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)