[ https://issues.apache.org/jira/browse/MESOS-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372580#comment-14372580 ]
Timothy Chen commented on MESOS-540: ------------------------------------ That health check was originally added by me, and it's not for this ticket. The health check program in that folder is a seperate binary that is called for health checking a running task and determining the health of a task. What this ticket is heartbeating the executor itself form the slave to ensure it's still up. > Executor health checking. > ------------------------- > > Key: MESOS-540 > URL: https://issues.apache.org/jira/browse/MESOS-540 > Project: Mesos > Issue Type: Improvement > Reporter: Benjamin Mahler > Labels: twitter > > We currently do not health check running executors. > At Twitter, this has led to out-of-band health checking of executors for an > internal framework. > For the Storm framework, this has led to out-of-band health checking via > ZooKeeper. Health checking would allow Storm to use finer grained executors > for better isolation. > This also helps the Hadoop and Jenkins frameworks as well should health > checking be desired. > As for implementation, I would propose adding a call on the Executor > interface: > /** > * Invoked by the ExecutorDriver to determine the health of the executor. > * When this function returns, the Executor is considered healthy. > */ > void heartbeat(ExecutorDriver* driver) = 0; > The driver can then heartbeat periodically and kill when the Executor is not > responding to heartbeats. The driver should also detect the executor > deadlocking on any of the other callbacks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)