[ 
https://issues.apache.org/jira/browse/MESOS-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-540:
---------------------------
    Labels: health-check twitter  (was: twitter)

> Executor health checking.
> -------------------------
>
>                 Key: MESOS-540
>                 URL: https://issues.apache.org/jira/browse/MESOS-540
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Benjamin Mahler
>              Labels: health-check, twitter
>
> We currently do not health check running executors.
> At Twitter, this has led to out-of-band health checking of executors for an 
> internal framework.
> For the Storm framework, this has led to out-of-band health checking via 
> ZooKeeper. Health checking would allow Storm to use finer grained executors 
> for better isolation.
> This also helps the Hadoop and Jenkins frameworks as well should health 
> checking be desired.
> As for implementation, I would propose adding a call on the Executor 
> interface:
> /**
>  * Invoked by the ExecutorDriver to determine the health of the executor.
>  * When this function returns, the Executor is considered healthy.
>  */
> void heartbeat(ExecutorDriver* driver) = 0;
> The driver can then heartbeat periodically and kill when the Executor is not 
> responding to heartbeats. The driver should also detect the executor 
> deadlocking on any of the other callbacks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to