Chun-Hung Hsiao created MESOS-8468:
--------------------------------------

             Summary: `LAUNCH_GROUP` failure tears down the default executor.
                 Key: MESOS-8468
                 URL: https://issues.apache.org/jira/browse/MESOS-8468
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 1.4.0, 1.3.0, 1.2.0, 1.5.0
            Reporter: Chun-Hung Hsiao
            Assignee: Vinod Kone


The following code in the default executor 
(https://github.com/apache/mesos/blob/12be4ba002f2f5ff314fbc16af51d095b0d90e56/src/launcher/default_executor.cpp#L525-L535)
 shows that if a `LAUNCH_NESTED_CONTAINER` call is failed (say, due to a 
fetcher failure), the whole executor will be shut down:
{code:cpp}
// Check if we received a 200 OK response for all the
// `LAUNCH_NESTED_CONTAINER` calls. Shutdown the executor
// if this is not the case.
foreach (const Response& response, responses.get()) {
  if (response.code != process::http::Status::OK) {
    LOG(ERROR) << "Received '" << response.status << "' ("
               << response.body << ") while launching child container";
    _shutdown();
    return;
  }
}
{code}

This is not expected by a user. Instead, one would expect that a failed 
`LAUNCH_GROUP` won't affect other task groups launched by the same executor, 
similar to the case that a task failure only takes down its own task group. We 
should adjust the semantics to make a failed `LAUNCH_GROUP` not take down the 
executor and affect other task groups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to