[
https://issues.apache.org/jira/browse/MESOS-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534077#comment-16534077
]
Vinod Kone commented on MESOS-9052:
-----------------------------------
Instead of suicide, it should shutdown the current task group. Since one
task/container failing to launch shouldn't impact other task groups.
Also, should this be more generically applied to all calls from executor to
agent or just launch?
cc [~gkleiman]
> Default executor should commit suicide if it cannot receive HTTP responses
> for LAUNCH_NESTED_CONTAINER calls.
> -------------------------------------------------------------------------------------------------------------
>
> Key: MESOS-9052
> URL: https://issues.apache.org/jira/browse/MESOS-9052
> Project: Mesos
> Issue Type: Bug
> Components: executor
> Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0
> Reporter: Chun-Hung Hsiao
> Priority: Major
>
> If there is a network problem (e.g., a routing problem), it is possible that
> the agent has received {{LAUNCH_NESTED_CONTAINER}} calls from the default
> executor and launched the nested container, but the executor does not get the
> HTTP response. This would result in tasks stuck at {{TASK_STARTING}} forever.
> We should consider making the default executor commit suicide if it does not
> receive the response in a reasonable amount of time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)