[ 
https://issues.apache.org/jira/browse/MESOS-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965447#comment-13965447
 ] 

Niklas Quarfot Nielsen commented on MESOS-922:
----------------------------------------------

We wanted to expand a bit on recovery, as executor infos will not be check 
pointed up front and slave fail-over while the containerizer is launching need 
to be addressed.
This is what I imagined (and have tested with Ben's new test 
https://reviews.apache.org/r/20179) in case of slave fail-over during launch:

1) In Framework::launch() _both_ work and meta directory is created for the 
executor. This requires us to know the container id up front, which prevents 
delegation of container id generation to the containerizer for now.

** Slave fails over **

2) There will be a single run in the executor meta directory - but no check 
pointed executor info (See [1]). The recoverExecutor code does nothing special, 
but simply not setting the executor info if it is not present in State::info.

3) In Slave::_recover(), we detect whether the task info is present. If not, we 
issue containerizer->destroy() _after_ wait().onAny(executorTerminated) has 
been setup. This will make sure that TASK_LOST are sent for the expected tasks:

containerizer->wait(executor->containerId)
  .onAny(defer(self(),
        &Self::executorTerminated,
        framework->id,
        executor->id,
        lambda::_1));

if (executor->info.isSome()) {
  // Monitor the executor.
  monitor.start(
      executor->containerId,
      executor->info.get(),
      flags.resource_monitoring_interval)
    .onAny(lambda::bind(_monitor,
                        lambda::_1,
                        framework->id,
                        executor->id,
                        executor->containerId));
} else {
  containerizer->destroy(executor->containerId);
  continue;
}


[1] This requires some changes to the current state code, namely executor 
recovery will try to recover runs even though executor info isn't present. 
State::info is an option type and will simply not be set.

> Containerizer to support launching tasks by TaskInfo
> ----------------------------------------------------
>
>                 Key: MESOS-922
>                 URL: https://issues.apache.org/jira/browse/MESOS-922
>             Project: Mesos
>          Issue Type: Improvement
>          Components: isolation
>            Reporter: Ian Downes
>            Assignee: Niklas Quarfot Nielsen
>             Fix For: 0.19.0
>
>
> Currently the slave runs tasks by using an existing executor or by launching 
> a new executor. When a task's TaskInfo doesn't specify the executor with an 
> ExecutorInfo (has a CommandInfo instead) the slave will create an 
> ExecutorInfo specifying the mesos-command executor.
> The decision on how to launch a task could instead be delegated to the 
> containerizer and the TaskInfo would be passed unmodifed. This would have the 
> following advantages:
> 1) The containerizer can decide on the executor to run the task, either the 
> mesos-executor or a specialized executor appropriate for the containerization 
> implementation. Furthermore, the containerizer can allocate appropriate 
> additional resources for the executor.
> 2) The containerizer can see the task's resources and can allocate these when 
> the executor is launched. This is useful for such containerizer 
> implementations as KVM where it is harder to dynamically adjust a 
> containerized executor's resources. This is most applicable when the executor 
> will only run a single task.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to