Benjamin Hindman created MESOS-3544:
---------------------------------------

             Summary: Support task and/or executor restart on failure.
                 Key: MESOS-3544
                 URL: https://issues.apache.org/jira/browse/MESOS-3544
             Project: Mesos
          Issue Type: Bug
          Components: HTTP API, master, slave
            Reporter: Benjamin Hindman


In certain instances it might be preferable to restart a task/executor after it 
fails (i.e., non-zero exit code) rather than going through an entire status 
update -> offer -> accept (launch) cycle to restart the task/executor on the 
same machine. This is especially true if the resources are reserved 
(dynamically or statically).

Of course, we still want to highlight the restart to the framework, so 
introducing something like TASK_RESTARTED might be necessary (not sure what the 
analog would be for executors).

Finally, if the task/executor has a bug we don't want to sit in an infinite 
loop, so we'll likely want to introduce this functionality in such a way as to 
limit the total restart attempts (or force a framework to have the proper 
authority to restart forever).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to