[ 
https://issues.apache.org/jira/browse/MESOS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dusek updated MESOS-816:
------------------------------

    Description: 
Being able to delegate isolation to shell scripts could make it easier to 
leverage the machinery provided by the LXC tools, LibVirt, VirtualBox, Docker 
and similar containerization systems.

Why go through command line tools for isolation? We have seen many requests for 
isolation, covering a wide variety of scenarios:

- Setups requiring multiple versions of the same language (Ruby 1.8, Ruby 1.9).
- Setups requiring installation and configuration of RPM-packaged applications.
- Build-and-test setups, where sharing the environment of the host would impact 
reproducibility.
- Integration of 3rd party, service-oriented applications.
- Launching applications with Docker.
- Launching multiple instances of a Mesos framework that, like Hadoop, has 
significant system setup and dependencies.

To cover these and other use cases, it seems reasonable to allow Mesos to 
delegate to external programs for isolation:

- It makes it easier to experiment with new containerization tools.
- It allows for site administrators to customize containerization, or even 
implement new containerization mechanisms, without impacting their ability to 
keep pace with Mesos development.
- Many external programs exist for containerization -- Docker, LXC tools, 
LibVirt -- which handle a great deal of the book-keeping around finding and 
efficiently cloning disk images and setting up the guest system (its hostname, 
TTYs, /dev/*, /proc).

The scenarios listed above can be understood in terms of three use cases:

- The containerized system service scenario, wherein an application, installed 
with RPM or a similar tool, is started and managed by the init system within a 
container. Percona MySQL is an example of such an application.
- The containerized application scenario, wherein an application is installed 
or unpacked and then configured and launched in a single command. For example, 
running a custom Rails app with bundle install && bundle exec rails.
- The containerized framework/executor scenario, wherein the application is 
Spark, Hadoop or another Mesos framework/executor pair.

One way to achieve this could be to introduce an External Isolator, which works 
in parallel with the existing process/posix and cgroups isolators. The 
responsibility of this isolator would be to act as a thin layer to external 
isolators. Calls for task launching, stopping or any other resource change 
would be serialized and passed to the external isolators by the Mesos External 
Isolator. 

Allowing for pluggable isolators invites the possibility of having different 
isolators per task. For applications using containers, it's reasonable that 
each application or framework can specify a different base image; and this 
would be an option passed to the corresponding isolator. One can also imagine 
specialized frameworks that need to disable isolation entirely. For example, a 
"system backup" framework that would specify a null isolator to allow it to 
snapshot interesting data on each slave and transfer it to a sanctioned storage 
location.

However, for users and framework authors to specify isolators would both be 
harmful to portability and would make isolation their problem, no longer 
something handled transparently by Mesos. Furthermore, it would have the 
unintended effect of putting them at odds with site administrators, who would 
also specify isolators -- as a command line option for each slave.

Allowing tasks to carry a more abstract notion of "container" with them would 
allow for most application level scenarios we've outlined above.  
Theoretically, more than one isolator might be able to handle a given 
container. For example if, the container is specified as an "ISO" and a distro 
LiveCD is provided, one could imagine a Docker isolator, LXC isolator or 
Virtualbox isolator handling it. Encouraging users and framework authors to 
specify a container would be simpler for them than specifying isolator flags, 
allows them to more clearly document their intent, and reduces the scope for 
conflict with other parties who have an interest in upgrading and tuning 
isolation. It also makes applications and command examples more portable, by 
decoupling the isolation mechanism from the desired container layout (which is, 
more or less, a chroot with some files in it).

To this end, we propose adding an optional ContainerInfo to each CommandInfo:

    message CommandInfo {
      message ContainerInfo {
        required bytes image = 1;
        repeated bytes options = 2;
      }

      ...

      optional ContainerInfo container = 4;
    }

The first field of the ContainerInfo should indicate the image, perhaps as a 
URL. For example:

    docker:///johncosta/redis
    iso+http://mirrors.kernel.org/knoppix/KNOPPIX_V7.2.0CD-2013-06-16-EN.iso
    lxc:///ubuntu

The scheme of the URL -- recognizable as a string of letters and digits and 
perhaps plusses, dots and dashes preceding the first `://`, per RFC 3986 -- 
serves to indicate the type of the container, which isolators can use to 
determine both what to do with a container and how to obtain it. For the Docker 
URL type, for example, the absence of a host between the second and third 
slashes could be interpreted to mean that the image should be fetched from the 
Docker index or from a locally configured default Docker image server; whereas 
if a hostname is given, it is treated as the image server to use.

The addition of "options" to the ContainerInfo poses a risk to portability and 
warrants both explanation and justification. In the case of Docker URLs, for 
example, it is possible to mount additional filesystems on the Docker command 
line; and these filesystems can even be indicated by reference to another 
Docker container by name. Support for this feature is clearly tied to the 
Docker URL and its meaning.

When the default isolator for a slave is specified, there may also be a default 
container specified. It is good for us, then, that the ContainerInfo structure 
maps cleanly to an array of byte strings, since this is an easy thing to handle 
from the command line.

Now in practice, how will we use the ContainerInfo? In the three use cases 
outlined above -- service container, command container and containerized 
executor -- tasks needing a special container will specify an ExecutorInfo in 
the TaskInfo and not a bare CommandInfo. The ContainerInfo would then be part 
of the CommandInfo embedded in the ExecutorInfo.

To consider a specific case, were the Storm framework packaged in a container, 
then the same container could be used both for Nimbus and the worker nodes:

* Nimbus would be launched with a TaskInfo requesting the container and 
launching Nimbus.

        TaskInfo {
          executor = ExecutorInfo {
            command = CommandInfo {
              value = "python /opt/storm/bin/storm go"
              containerInfo = ContainerInfo {
                image = "docker:///storm-mesos/latest"
                options = [ "-p", "1337:8000" ]
              }
            }
            ...
          }
          ...
        }

* Nimbus would launch executors with a TaskInfo requesting the very same 
container, but specifying a different command.

        TaskInfo {
          executor = ExecutorInfo {
            command = CommandInfo {
              value = "curl -sSfL http://storm.server:1337/conf/storm.yaml -o 
/opt/storm/conf/storm.yaml && python /opt/storm/bin/storm supervisor 
storm.mesos.MesosSupervisor"
              containerInfo = ContainerInfo {
                image = "docker:///storm-mesos/latest"
              }
            }
            ...
          }
          ...
        }

While in the near term we expect container URLs to be pretty specific to the 
containerization mechanism, let us hope for a glorious future with URLs like 
`img:///ubuntu-13.04` that point to well-known, portable images.


  was:
Being able to delegate isolation to shell scripts could make it easier to 
leverage the machinery provided by the LXC tools, LibVirt, VirtualBox, Docker 
and similar containerization systems.

Why go through command line tools for isolation? We have seen many requests for 
isolation, covering a wide variety of scenarios:

- Setups requiring multiple versions of the same language (Ruby 1.8, Ruby 1.9).
- Setups requiring installation and configuration of RPM-packaged applications.
- Build-and-test setups, where sharing the environment of the host would impact 
reproducibility.
- Integration of 3rd party, service-oriented applications.
- Launching applications with Docker.
- Launching multiple instances of a Mesos framework that, like Hadoop, has 
significant system setup and dependencies.

To cover these and other use cases, it seems reasonable to allow Mesos to 
delegate to external programs for isolation:

- It makes it easier to experiment with new containerization tools.
- It allows for site administrators to customize containerization, or even 
implement new containerization mechanisms, without impacting their ability to 
keep pace with Mesos development.
- Many external programs exist for containerization -- Docker, LXC tools, 
LibVirt -- which handle a great deal of the book-keeping around finding and 
efficiently cloning disk images and setting up the guest system (its hostname, 
TTYs, /dev/*, /proc).

The scenarios listed above can be understood in terms of three use cases:

- The containerized system service scenario, wherein an application, installed 
with RPM or a similar tool, is started and managed by the init system within a 
container. Percona MySQL is an example of such an application.
- The containerized application scenario, wherein an application is installed 
or unpacked and then configured and launched in a single command. For example, 
running a custom Rails app with bundle install && bundle exec rails.
- The containerized framework/executor scenario, wherein the application is 
Spark, Hadoop or another Mesos framework/executor pair.

One way to achieve this could be to introduce an External Isolator, which works 
in parallel with the existing process/posix and cgroups isolators. The 
responsibility of this isolator would be to act as a thin layer to external 
isolators. Calls for task launching, stopping or any other resource change 
would be serialized and passed to the external isolators by the Mesos External 
Isolator. 

Allowing for pluggable isolators invites the possibility of having different
isolators per task. For applications using containers, it's reasonable that
each application or framework can specify a different base image; and this
would be an option passed to the corresponding isolator. One can also imagine
specialized frameworks that need to disable isolation entirely. For example, a
"system backup" framework that would specify a null isolator to allow it to
snapshot interesting data on each slave and transfer it to a sanctioned
storage location.

However, for users and framework authors to specify isolators would both be
harmful to portability and would make isolation their problem, no longer
something handled transparently by Mesos. Furthermore, it would have the
unintended effect of putting them at odds with site administrators, who would
also specify a default isolator, with a command line option for each slave.

Allowing tasks to carry a more abstract notion of "container" with them would
allow for most application level scenarios we've outlined above.
Theoretically, more than one isolator might be able to handle a given
container. For example if, the container is specified as an "ISO" and a distro
LiveCD is provided, one could imagine a Docker isolator, LXC isolator or
Virtualbox isolator handling it. Encouraging users and framework authors to
specify a container is simpler for them than specifying isolator flags,
allows them to more clearly document their intent, and reduces the scope for
conflict with other parties who have an interest in upgrading and tuning
isolation. It also makes applications and command examples more portable, by
decoupling the isolation mechanism from the desired container layout (which
is, more or less, a chroot with some files in it).

To this end, we propose adding an optional ContainerInfo to each CommandInfo:

    message CommandInfo {
      message ContainerInfo {
        required bytes image = 1;
        repeated bytes options = 2;
      }
      ...
      optional ContainerInfo container = 4;
    }

The first field of the ContainerInfo should indicate the image, perhaps as a
URL. For example:

    docker:///johncosta/redis
    iso+http://mirrors.kernel.org/knoppix/KNOPPIX_V7.2.0CD-2013-06-16-EN.iso
    lxc:///ubuntu

The scheme of the URL -- recognizable as a string of letters and digits and
perhaps plusses, dots and dashes preceding the first `://`, per RFC 3986 --
serves to indicate the type of the container, which isolators can use to
determine both what to do with a container and how to obtain it. For the
Docker URL type, for example, the absence of a host between the second and
third slashes could be interpreted to mean that the image should be fetched
from the Docker index or from a locally configured default Docker image
server; whereas if a hostname is given, it is treated as the image server to
use.

The addition of "options" to the ContainerInfo poses a risk to portability and
warrants both explanation and justification. In the case of Docker URLs, for
example, it is possible to mount additional filesystems on the Docker command
line; and these filesystems can even be indicated by reference to another
Docker container by name. Support for this feature is clearly tied to the
Docker URL and its meaning.

When the default isolator for a slave is specified, there may also be a
default container specified. It is good for us, then, that the ContainerInfo
structure maps cleanly to an array of byte strings, since this is an easy
thing to handle from the command line.

Now in practice, how will we use the ContainerInfo? In the three use cases
outlined above -- service container, command container and containerized
executor -- tasks needing a special container will specify an ExecutorInfo
in the TaskInfo and not a bare CommandInfo. The ContainerInfo would then be
part of the CommandInfo embedded in the ExecutorInfo.

To consider a specific case, were the Storm framework packaged in a container,
then the same container could be used both for Nimbus and the worker nodes:

* Nimbus would be launched with a TaskInfo requesting the container and
  launching Nimbus.

        TaskInfo {
          executor = ExecutorInfo {
            command = CommandInfo {
              value = "python /opt/storm/bin/storm go"
              containerInfo = ContainerInfo {
                image = "docker:///storm-mesos/latest"
                options = [ "-p", "1337:8000" ]
              }
            }
            ...
          }
          ...
        }

* Nimbus would launch executors with a TaskInfo requesting the very same
  container, but specifying a different command.

        TaskInfo {
          executor = ExecutorInfo {
            command = CommandInfo {
              value = "curl -sSfL http://storm.server:1337/conf/storm.yaml -o 
/opt/storm/conf/storm.yaml && python /opt/storm/bin/storm supervisor 
storm.mesos.MesosSupervisor"
              containerInfo = ContainerInfo {
                image = "docker:///storm-mesos/latest"
              }
            }
            ...
          }
          ...
        }

While in the near term we expect container URLs to be pretty specific to the
containerization mechanism, let us hope for a glorious future with URLs like
`img:///ubuntu-13.04` that point to well-known, portable images.




> Allow delegation to shell scripts for isolation
> -----------------------------------------------
>
>                 Key: MESOS-816
>                 URL: https://issues.apache.org/jira/browse/MESOS-816
>             Project: Mesos
>          Issue Type: Improvement
>          Components: isolation, slave
>            Reporter: Jason Dusek
>            Priority: Minor
>         Attachments: mesos-shell-isolator.jpg
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Being able to delegate isolation to shell scripts could make it easier to 
> leverage the machinery provided by the LXC tools, LibVirt, VirtualBox, Docker 
> and similar containerization systems.
> Why go through command line tools for isolation? We have seen many requests 
> for isolation, covering a wide variety of scenarios:
> - Setups requiring multiple versions of the same language (Ruby 1.8, Ruby 
> 1.9).
> - Setups requiring installation and configuration of RPM-packaged 
> applications.
> - Build-and-test setups, where sharing the environment of the host would 
> impact reproducibility.
> - Integration of 3rd party, service-oriented applications.
> - Launching applications with Docker.
> - Launching multiple instances of a Mesos framework that, like Hadoop, has 
> significant system setup and dependencies.
> To cover these and other use cases, it seems reasonable to allow Mesos to 
> delegate to external programs for isolation:
> - It makes it easier to experiment with new containerization tools.
> - It allows for site administrators to customize containerization, or even 
> implement new containerization mechanisms, without impacting their ability to 
> keep pace with Mesos development.
> - Many external programs exist for containerization -- Docker, LXC tools, 
> LibVirt -- which handle a great deal of the book-keeping around finding and 
> efficiently cloning disk images and setting up the guest system (its 
> hostname, TTYs, /dev/*, /proc).
> The scenarios listed above can be understood in terms of three use cases:
> - The containerized system service scenario, wherein an application, 
> installed with RPM or a similar tool, is started and managed by the init 
> system within a container. Percona MySQL is an example of such an application.
> - The containerized application scenario, wherein an application is installed 
> or unpacked and then configured and launched in a single command. For 
> example, running a custom Rails app with bundle install && bundle exec rails.
> - The containerized framework/executor scenario, wherein the application is 
> Spark, Hadoop or another Mesos framework/executor pair.
> One way to achieve this could be to introduce an External Isolator, which 
> works in parallel with the existing process/posix and cgroups isolators. The 
> responsibility of this isolator would be to act as a thin layer to external 
> isolators. Calls for task launching, stopping or any other resource change 
> would be serialized and passed to the external isolators by the Mesos 
> External Isolator. 
> Allowing for pluggable isolators invites the possibility of having different 
> isolators per task. For applications using containers, it's reasonable that 
> each application or framework can specify a different base image; and this 
> would be an option passed to the corresponding isolator. One can also imagine 
> specialized frameworks that need to disable isolation entirely. For example, 
> a "system backup" framework that would specify a null isolator to allow it to 
> snapshot interesting data on each slave and transfer it to a sanctioned 
> storage location.
> However, for users and framework authors to specify isolators would both be 
> harmful to portability and would make isolation their problem, no longer 
> something handled transparently by Mesos. Furthermore, it would have the 
> unintended effect of putting them at odds with site administrators, who would 
> also specify isolators -- as a command line option for each slave.
> Allowing tasks to carry a more abstract notion of "container" with them would 
> allow for most application level scenarios we've outlined above.  
> Theoretically, more than one isolator might be able to handle a given 
> container. For example if, the container is specified as an "ISO" and a 
> distro LiveCD is provided, one could imagine a Docker isolator, LXC isolator 
> or Virtualbox isolator handling it. Encouraging users and framework authors 
> to specify a container would be simpler for them than specifying isolator 
> flags, allows them to more clearly document their intent, and reduces the 
> scope for conflict with other parties who have an interest in upgrading and 
> tuning isolation. It also makes applications and command examples more 
> portable, by decoupling the isolation mechanism from the desired container 
> layout (which is, more or less, a chroot with some files in it).
> To this end, we propose adding an optional ContainerInfo to each CommandInfo:
>     message CommandInfo {
>       message ContainerInfo {
>         required bytes image = 1;
>         repeated bytes options = 2;
>       }
>       ...
>       optional ContainerInfo container = 4;
>     }
> The first field of the ContainerInfo should indicate the image, perhaps as a 
> URL. For example:
>     docker:///johncosta/redis
>     iso+http://mirrors.kernel.org/knoppix/KNOPPIX_V7.2.0CD-2013-06-16-EN.iso
>     lxc:///ubuntu
> The scheme of the URL -- recognizable as a string of letters and digits and 
> perhaps plusses, dots and dashes preceding the first `://`, per RFC 3986 -- 
> serves to indicate the type of the container, which isolators can use to 
> determine both what to do with a container and how to obtain it. For the 
> Docker URL type, for example, the absence of a host between the second and 
> third slashes could be interpreted to mean that the image should be fetched 
> from the Docker index or from a locally configured default Docker image 
> server; whereas if a hostname is given, it is treated as the image server to 
> use.
> The addition of "options" to the ContainerInfo poses a risk to portability 
> and warrants both explanation and justification. In the case of Docker URLs, 
> for example, it is possible to mount additional filesystems on the Docker 
> command line; and these filesystems can even be indicated by reference to 
> another Docker container by name. Support for this feature is clearly tied to 
> the Docker URL and its meaning.
> When the default isolator for a slave is specified, there may also be a 
> default container specified. It is good for us, then, that the ContainerInfo 
> structure maps cleanly to an array of byte strings, since this is an easy 
> thing to handle from the command line.
> Now in practice, how will we use the ContainerInfo? In the three use cases 
> outlined above -- service container, command container and containerized 
> executor -- tasks needing a special container will specify an ExecutorInfo in 
> the TaskInfo and not a bare CommandInfo. The ContainerInfo would then be part 
> of the CommandInfo embedded in the ExecutorInfo.
> To consider a specific case, were the Storm framework packaged in a 
> container, then the same container could be used both for Nimbus and the 
> worker nodes:
> * Nimbus would be launched with a TaskInfo requesting the container and 
> launching Nimbus.
>         TaskInfo {
>           executor = ExecutorInfo {
>             command = CommandInfo {
>               value = "python /opt/storm/bin/storm go"
>               containerInfo = ContainerInfo {
>                 image = "docker:///storm-mesos/latest"
>                 options = [ "-p", "1337:8000" ]
>               }
>             }
>             ...
>           }
>           ...
>         }
> * Nimbus would launch executors with a TaskInfo requesting the very same 
> container, but specifying a different command.
>         TaskInfo {
>           executor = ExecutorInfo {
>             command = CommandInfo {
>               value = "curl -sSfL http://storm.server:1337/conf/storm.yaml -o 
> /opt/storm/conf/storm.yaml && python /opt/storm/bin/storm supervisor 
> storm.mesos.MesosSupervisor"
>               containerInfo = ContainerInfo {
>                 image = "docker:///storm-mesos/latest"
>               }
>             }
>             ...
>           }
>           ...
>         }
> While in the near term we expect container URLs to be pretty specific to the 
> containerization mechanism, let us hope for a glorious future with URLs like 
> `img:///ubuntu-13.04` that point to well-known, portable images.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to