Steven Schlansker created MESOS-2684:
----------------------------------------

             Summary: mesos-slave should not abort when a single task has e.g. 
a 'mkdir' failure
                 Key: MESOS-2684
                 URL: https://issues.apache.org/jira/browse/MESOS-2684
             Project: Mesos
          Issue Type: Bug
          Components: slave
    Affects Versions: 0.21.1
            Reporter: Steven Schlansker


mesos-slave can encounter a variety of problems while attempting to launch a 
task.  If the task fails, that is unfortunate, but not the end of the world.  
Other tasks should not be affected.

However, if the task failure happens to trigger an assertion, the entire slave 
comes crashing down:

F0501 19:10:46.095464  1705 paths.hpp:342] CHECK_SOME(mkdir): No space left on 
device Failed to create executor directory 
'/mnt/mesos/slaves/20150327-194449-419644938-5050-1649-S71/frameworks/Singularity/executors/pp-gc-eventlog-teamcity.2015.03.31T23.55.14-1430507446029-2-10.70.8.160-us_west_2b/runs/95a54aeb-322c-48e9-9f6f-5b359bccbc01'

Immediately afterwards, all tasks on this slave were declared TASK_LOST when 
mesos-slave restarted.

Something as simple as a 'mkdir' failing is not worthy of an assertion failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to