[ https://issues.apache.org/jira/browse/MESOS-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
haosdent updated MESOS-540: --------------------------- Labels: health-check twitter (was: twitter) > Executor health checking. > ------------------------- > > Key: MESOS-540 > URL: https://issues.apache.org/jira/browse/MESOS-540 > Project: Mesos > Issue Type: Improvement > Reporter: Benjamin Mahler > Labels: health-check, twitter > > We currently do not health check running executors. > At Twitter, this has led to out-of-band health checking of executors for an > internal framework. > For the Storm framework, this has led to out-of-band health checking via > ZooKeeper. Health checking would allow Storm to use finer grained executors > for better isolation. > This also helps the Hadoop and Jenkins frameworks as well should health > checking be desired. > As for implementation, I would propose adding a call on the Executor > interface: > /** > * Invoked by the ExecutorDriver to determine the health of the executor. > * When this function returns, the Executor is considered healthy. > */ > void heartbeat(ExecutorDriver* driver) = 0; > The driver can then heartbeat periodically and kill when the Executor is not > responding to heartbeats. The driver should also detect the executor > deadlocking on any of the other callbacks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)