Benjamin Hindman created MESOS-3544:
---------------------------------------
Summary: Support task and/or executor restart on failure.
Key: MESOS-3544
URL: https://issues.apache.org/jira/browse/MESOS-3544
Project: Mesos
Issue Type: Bug
Components: HTTP API, master, slave
Reporter: Benjamin Hindman
In certain instances it might be preferable to restart a task/executor after it
fails (i.e., non-zero exit code) rather than going through an entire status
update -> offer -> accept (launch) cycle to restart the task/executor on the
same machine. This is especially true if the resources are reserved
(dynamically or statically).
Of course, we still want to highlight the restart to the framework, so
introducing something like TASK_RESTARTED might be necessary (not sure what the
analog would be for executors).
Finally, if the task/executor has a bug we don't want to sit in an infinite
loop, so we'll likely want to introduce this functionality in such a way as to
limit the total restart attempts (or force a framework to have the proper
authority to restart forever).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)