[ 
https://issues.apache.org/jira/browse/MESOS-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911920#comment-13911920
 ] 

Benjamin Mahler commented on MESOS-1033:
----------------------------------------

We really need to make an investment in building (or using an existing) out our 
Statistics abstraction in libprocess for making it trivial to add statistics 
and to expose statistics in an industry standard way, so that existing 
monitoring systems can consume the data easily. Tim had filed MESOS-780 to 
express this desire for inter-operability, I've just filed MESOS-1036 for what 
we'll need to make it trivial to expose a statistic from any part of the code.

> Create a stat for executors timing out on registration
> ------------------------------------------------------
>
>                 Key: MESOS-1033
>                 URL: https://issues.apache.org/jira/browse/MESOS-1033
>             Project: Mesos
>          Issue Type: Improvement
>          Components: statistics
>            Reporter: Vinod Kone
>             Fix For: 0.19.0
>
>
> At Twitter we have seen cases where a slave host went in to a bad state 
> (possible kernel bug) resulting in isolator/containerizer being blocked 
> resulting in executors not being able to be launched.
> It would be nice to have a stat to expose the number of executors that are 
> being killed due to registration timeout to alert on this behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to