[ 
https://issues.apache.org/jira/browse/MESOS-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363885#comment-16363885
 ] 

Qian Zhang commented on MESOS-8468:
-----------------------------------

commit 632ff7f7f8e32d3f9507e9199c8a253ff755224e
Author: Gaston Kleiman <gas...@mesosphere.io>
Date: Wed Feb 14 14:35:34 2018 +0800

Removed outdated executor-wide launched flag from the default executor.
 
 Review: https://reviews.apache.org/r/65616/

src/launcher/default_executor.cpp | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

commit 54b6c5b9c7cb059ebd87ee0f9927cfa6ff73129d
Author: Gaston Kleiman <gas...@mesosphere.io>
Date: Wed Feb 14 14:35:22 2018 +0800

Made the default executor treat agent disconnections more gracefully.
 
 This patch makes the default executor not shutdown if there are active
 child containers, and it fails to connect or is not subscribed to the
 agent when starting to launch a task group.
 
 Review: https://reviews.apache.org/r/65556/

src/launcher/default_executor.cpp | 43 
+++++++++++++++++++++++++++++++++++--------
 1 file changed, 35 insertions(+), 8 deletions(-)

commit 656196eeca4ab6449c4b9f329b5b9cac2f69a885
Author: Gaston Kleiman <gas...@mesosphere.io>
Date: Wed Feb 14 14:35:17 2018 +0800

Added a regression test for MESOS-8468.
 
 Review: https://reviews.apache.org/r/65552/

src/tests/default_executor_tests.cpp | 252 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 252 insertions(+)

commit c3f3542e7ecce82cad8b75fdc2db14fe8c43a5da
Author: Gaston Kleiman <gas...@mesosphere.io>
Date: Wed Feb 14 14:35:11 2018 +0800

Stopped shutting down the whole default executor on task launch failure.

The default executor would be completely shutdown on a
 `LAUNCH_NESTED_CONTAINER` failure.
 
 This patch makes it kill the affected task group instead of shutting
 down and killing all task groups.
 
 Review: https://reviews.apache.org/r/65551/

src/launcher/default_executor.cpp | 165 
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------------
 1 file changed, 103 insertions(+), 62 deletions(-)

commit 5c8852b244b09b4ae57e00abcd940482927d57e6
Author: Gaston Kleiman <gas...@mesosphere.io>
Date: Wed Feb 14 14:35:01 2018 +0800

Made default executor not shutdown if unsubscribed during task launch.
 
 The default executor would unnecessarily shutdown if, while launching a
 task group, it gets unsubscribed after having successfully launched the
 task group's containers.
 
 Review: https://reviews.apache.org/r/65550/

src/launcher/default_executor.cpp | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

commit 2e570b709dc7d15c73c8d728ef0b32e2416b0a08
Author: Gaston Kleiman <gas...@mesosphere.io>
Date: Wed Feb 14 14:34:56 2018 +0800

Improved some default executor log messages.
 
 Review: https://reviews.apache.org/r/65549/

src/launcher/default_executor.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

commit 29d1e4e1a1b894da78c2033f1932b282ee794f4b
Author: Gaston Kleiman <gas...@mesosphere.io>
Date: Wed Feb 14 14:34:50 2018 +0800

Added `Event::Update` and `v1::scheduler::TaskStatus` ostream operators.
 
 This operators make gtest print a human-readable representation of the
 protos on test failures.
 
 Review: https://reviews.apache.org/r/65548/

include/mesos/v1/mesos.hpp | 3 +++
 include/mesos/v1/scheduler/scheduler.hpp | 10 ++++++++++
 src/v1/mesos.cpp | 37 +++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

> `LAUNCH_GROUP` failure tears down the default executor.
> -------------------------------------------------------
>
>                 Key: MESOS-8468
>                 URL: https://issues.apache.org/jira/browse/MESOS-8468
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>            Reporter: Chun-Hung Hsiao
>            Assignee: Gastón Kleiman
>            Priority: Critical
>              Labels: default-executor, mesosphere
>             Fix For: 1.6.0, 1.5.1
>
>
> The following code in the default executor 
> (https://github.com/apache/mesos/blob/12be4ba002f2f5ff314fbc16af51d095b0d90e56/src/launcher/default_executor.cpp#L525-L535)
>  shows that if a `LAUNCH_NESTED_CONTAINER` call is failed (say, due to a 
> fetcher failure), the whole executor will be shut down:
> {code:cpp}
> // Check if we received a 200 OK response for all the
> // `LAUNCH_NESTED_CONTAINER` calls. Shutdown the executor
> // if this is not the case.
> foreach (const Response& response, responses.get()) {
>   if (response.code != process::http::Status::OK) {
>     LOG(ERROR) << "Received '" << response.status << "' ("
>                << response.body << ") while launching child container";
>     _shutdown();
>     return;
>   }
> }
> {code}
> This is not expected by a user. Instead, one would expect that a failed 
> `LAUNCH_GROUP` won't affect other task groups launched by the same executor, 
> similar to the case that a task failure only takes down its own task group. 
> We should adjust the semantics to make a failed `LAUNCH_GROUP` not take down 
> the executor and affect other task groups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to