[
https://issues.apache.org/jira/browse/MESOS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Dusek updated MESOS-816:
------------------------------
Description:
Being able to delegate isolation to shell scripts could make it easier to
leverage the machinery provided by the LXC tools, LibVirt, VirtualBox, Docker
and similar containerization systems.
Why go through command line tools for isolation? We have seen many requests for
isolation, covering a wide variety of scenarios:
- Setups requiring multiple versions of the same language (Ruby 1.8, Ruby 1.9).
- Setups requiring installation and configuration of RPM-packaged applications.
- Build-and-test setups, where sharing the environment of the host would impact
reproducibility.
- Integration of 3rd party, service-oriented applications.
- Launching applications with Docker.
- Launching multiple instances of a Mesos framework that, like Hadoop, has
significant system setup and dependencies.
To cover these and other use cases, it seems reasonable to allow Mesos to
delegate to external programs for isolation:
- It makes it easier to experiment with new containerization tools.
- It allows for site administrators to customize containerization, or even
implement new containerization mechanisms, without impacting their ability to
keep pace with Mesos development.
- Many external programs exist for containerization -- Docker, LXC tools,
LibVirt -- which handle a great deal of the book-keeping around finding and
efficiently cloning disk images and setting up the guest system (its hostname,
TTYs, /dev/*, /proc).
The scenarios listed above can be understood in terms of three use cases:
- The containerized system service scenario, wherein an application, installed
with RPM or a similar tool, is started and managed by the init system within a
container. Percona MySQL is an example of such an application.
- The containerized application scenario, wherein an application is installed
or unpacked and then configured and launched in a single command. For example,
running a custom Rails app with bundle install && bundle exec rails.
- The containerized framework/executor scenario, wherein the application is
Spark, Hadoop or another Mesos framework/executor pair.
One way to achieve this could be to introduce an External Isolator, which works
in parallel with the existing process/posix and cgroups isolators. The
responsibility of this isolator would be to act as a thin layer to external
isolators. Calls for task launching, stopping or any other resource change
would be serialized and passed to the external isolators by the Mesos External
Isolator.
Allowing for pluggable isolators invites the possibility of having different
isolators per task. For applications using containers, it's reasonable that
each application or framework can specify a different base image; and this
would be an option passed to the corresponding isolator. One can also imagine
specialized frameworks that need to disable isolation entirely. For example, a
"system backup" framework that would specify a null isolator to allow it to
snapshot interesting data on each slave and transfer it to a sanctioned
storage location.
However, for users and framework authors to specify isolators would both be
harmful to portability and would make isolation their problem, no longer
something handled transparently by Mesos. Furthermore, it would have the
unintended effect of putting them at odds with site administrators, who would
also specify a default isolator, with a command line option for each slave.
Allowing tasks to carry a more abstract notion of "container" with them would
allow for most application level scenarios we've outlined above.
Theoretically, more than one isolator might be able to handle a given
container. For example if, the container is specified as an "ISO" and a distro
LiveCD is provided, one could imagine a Docker isolator, LXC isolator or
Virtualbox isolator handling it. Encouraging users and framework authors to
specify a container is simpler for them than specifying isolator flags,
allows them to more clearly document their intent, and reduces the scope for
conflict with other parties who have an interest in upgrading and tuning
isolation. It also makes applications and command examples more portable, by
decoupling the isolation mechanism from the desired container layout (which
is, more or less, a chroot with some files in it).
To this end, we propose adding an optional ContainerInfo to each CommandInfo:
message CommandInfo {
message ContainerInfo {
required bytes image = 1;
repeated bytes options = 2;
}
...
optional ContainerInfo container = 4;
}
The first field of the ContainerInfo should indicate the image, perhaps as a
URL. For example:
docker:///johncosta/redis
iso+http://mirrors.kernel.org/knoppix/KNOPPIX_V7.2.0CD-2013-06-16-EN.iso
lxc:///ubuntu
The scheme of the URL -- recognizable as a string of letters and digits and
perhaps plusses, dots and dashes preceding the first `://`, per RFC 3986 --
serves to indicate the type of the container, which isolators can use to
determine both what to do with a container and how to obtain it. For the
Docker URL type, for example, the absence of a host between the second and
third slashes could be interpreted to mean that the image should be fetched
from the Docker index or from a locally configured default Docker image
server; whereas if a hostname is given, it is treated as the image server to
use.
The addition of "options" to the ContainerInfo poses a risk to portability and
warrants both explanation and justification. In the case of Docker URLs, for
example, it is possible to mount additional filesystems on the Docker command
line; and these filesystems can even be indicated by reference to another
Docker container by name. Support for this feature is clearly tied to the
Docker URL and its meaning.
When the default isolator for a slave is specified, there may also be a
default container specified. It is good for us, then, that the ContainerInfo
structure maps cleanly to an array of byte strings, since this is an easy
thing to handle from the command line.
Now in practice, how will we use the ContainerInfo? In the three use cases
outlined above -- service container, command container and containerized
executor -- tasks needing a special container will specify an ExecutorInfo
in the TaskInfo and not a bare CommandInfo. The ContainerInfo would then be
part of the CommandInfo embedded in the ExecutorInfo.
To consider a specific case, were the Storm framework packaged in a container,
then the same container could be used both for Nimbus and the worker nodes:
* Nimbus would be launched with a TaskInfo requesting the container and
launching Nimbus.
TaskInfo {
executor = ExecutorInfo {
command = CommandInfo {
value = "python /opt/storm/bin/storm go"
containerInfo = ContainerInfo {
image = "docker:///storm-mesos/latest"
options = [ "-p", "1337:8000" ]
}
}
...
}
...
}
* Nimbus would launch executors with a TaskInfo requesting the very same
container, but specifying a different command.
TaskInfo {
executor = ExecutorInfo {
command = CommandInfo {
value = "curl -sSfL http://storm.server:1337/conf/storm.yaml -o
/opt/storm/conf/storm.yaml && python /opt/storm/bin/storm supervisor
storm.mesos.MesosSupervisor"
containerInfo = ContainerInfo {
image = "docker:///storm-mesos/latest"
}
}
...
}
...
}
While in the near term we expect container URLs to be pretty specific to the
containerization mechanism, let us hope for a glorious future with URLs like
`img:///ubuntu-13.04` that point to well-known, portable images.
was:
Being able to delegate isolation to shell scripts could make it easier to
leverage the machinery provided by the LXC tools, LibVirt, VirtualBox, Docker
and similar containerization systems.
Why go through command line tools for isolation? We have seen many requests for
isolation, covering a wide variety of scenarios:
- Setups requiring multiple versions of the same language (Ruby 1.8, Ruby 1.9).
- Setups requiring installation and configuration of RPM-packaged applications.
- Build-and-test setups, where sharing the environment of the host would impact
reproducibility.
- Integration of 3rd party, service-oriented applications.
- Launching applications with Docker.
- Launching multiple instances of a Mesos framework that, like Hadoop, has
significant system setup and dependencies.
To cover these and other use cases, it seems reasonable to allow Mesos to
delegate to external programs for isolation:
- It makes it easier to experiment with new containerization tools.
- It allows for site administrators to customize containerization, or even
implement new containerization mechanisms, without impacting their ability to
keep pace with Mesos development.
- Many external programs exist for containerization -- Docker, LXC tools,
LibVirt -- which handle a great deal of the book-keeping around finding and
efficiently cloning disk images and setting up the guest system (its hostname,
TTYs, /dev/*, /proc).
The scenarios listed above can be understood in terms of three use cases:
- The containerized system service scenario, wherein an application, installed
with RPM or a similar tool, is started and managed by the init system within a
container. Percona MySQL is an example of such an application.
- The containerized application scenario, wherein an application is installed
or unpacked and then configured and launched in a single command. For example,
running a custom Rails app with bundle install && bundle exec rails.
- The containerized framework/executor scenario, wherein the application is
Spark, Hadoop or another Mesos framework/executor pair.
One way to achieve this could be to introduce an External Isolator, which works
in parallel with the existing process/posix and cgroups isolators. The
responsibility of this isolator would be to act as a thin layer to external
isolators. Calls for task launching, stopping or any other resource change
would be serialized and passed to the external isolators by the Mesos External
Isolator.
We think an approach like this adds a lot of flexibility while still keeping a
good clean architecture and avoids using executors for isolation.
However, we are currently exploring how to solve this problem so feel free to
opt in with ideas, comments and suggestions.
> Allow delegation to shell scripts for isolation
> -----------------------------------------------
>
> Key: MESOS-816
> URL: https://issues.apache.org/jira/browse/MESOS-816
> Project: Mesos
> Issue Type: Improvement
> Components: isolation, slave
> Reporter: Jason Dusek
> Priority: Minor
> Attachments: mesos-shell-isolator.jpg
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Being able to delegate isolation to shell scripts could make it easier to
> leverage the machinery provided by the LXC tools, LibVirt, VirtualBox, Docker
> and similar containerization systems.
> Why go through command line tools for isolation? We have seen many requests
> for isolation, covering a wide variety of scenarios:
> - Setups requiring multiple versions of the same language (Ruby 1.8, Ruby
> 1.9).
> - Setups requiring installation and configuration of RPM-packaged
> applications.
> - Build-and-test setups, where sharing the environment of the host would
> impact reproducibility.
> - Integration of 3rd party, service-oriented applications.
> - Launching applications with Docker.
> - Launching multiple instances of a Mesos framework that, like Hadoop, has
> significant system setup and dependencies.
> To cover these and other use cases, it seems reasonable to allow Mesos to
> delegate to external programs for isolation:
> - It makes it easier to experiment with new containerization tools.
> - It allows for site administrators to customize containerization, or even
> implement new containerization mechanisms, without impacting their ability to
> keep pace with Mesos development.
> - Many external programs exist for containerization -- Docker, LXC tools,
> LibVirt -- which handle a great deal of the book-keeping around finding and
> efficiently cloning disk images and setting up the guest system (its
> hostname, TTYs, /dev/*, /proc).
> The scenarios listed above can be understood in terms of three use cases:
> - The containerized system service scenario, wherein an application,
> installed with RPM or a similar tool, is started and managed by the init
> system within a container. Percona MySQL is an example of such an application.
> - The containerized application scenario, wherein an application is installed
> or unpacked and then configured and launched in a single command. For
> example, running a custom Rails app with bundle install && bundle exec rails.
> - The containerized framework/executor scenario, wherein the application is
> Spark, Hadoop or another Mesos framework/executor pair.
> One way to achieve this could be to introduce an External Isolator, which
> works in parallel with the existing process/posix and cgroups isolators. The
> responsibility of this isolator would be to act as a thin layer to external
> isolators. Calls for task launching, stopping or any other resource change
> would be serialized and passed to the external isolators by the Mesos
> External Isolator.
> Allowing for pluggable isolators invites the possibility of having different
> isolators per task. For applications using containers, it's reasonable that
> each application or framework can specify a different base image; and this
> would be an option passed to the corresponding isolator. One can also imagine
> specialized frameworks that need to disable isolation entirely. For example, a
> "system backup" framework that would specify a null isolator to allow it to
> snapshot interesting data on each slave and transfer it to a sanctioned
> storage location.
> However, for users and framework authors to specify isolators would both be
> harmful to portability and would make isolation their problem, no longer
> something handled transparently by Mesos. Furthermore, it would have the
> unintended effect of putting them at odds with site administrators, who would
> also specify a default isolator, with a command line option for each slave.
> Allowing tasks to carry a more abstract notion of "container" with them would
> allow for most application level scenarios we've outlined above.
> Theoretically, more than one isolator might be able to handle a given
> container. For example if, the container is specified as an "ISO" and a distro
> LiveCD is provided, one could imagine a Docker isolator, LXC isolator or
> Virtualbox isolator handling it. Encouraging users and framework authors to
> specify a container is simpler for them than specifying isolator flags,
> allows them to more clearly document their intent, and reduces the scope for
> conflict with other parties who have an interest in upgrading and tuning
> isolation. It also makes applications and command examples more portable, by
> decoupling the isolation mechanism from the desired container layout (which
> is, more or less, a chroot with some files in it).
> To this end, we propose adding an optional ContainerInfo to each CommandInfo:
> message CommandInfo {
> message ContainerInfo {
> required bytes image = 1;
> repeated bytes options = 2;
> }
> ...
> optional ContainerInfo container = 4;
> }
> The first field of the ContainerInfo should indicate the image, perhaps as a
> URL. For example:
> docker:///johncosta/redis
> iso+http://mirrors.kernel.org/knoppix/KNOPPIX_V7.2.0CD-2013-06-16-EN.iso
> lxc:///ubuntu
> The scheme of the URL -- recognizable as a string of letters and digits and
> perhaps plusses, dots and dashes preceding the first `://`, per RFC 3986 --
> serves to indicate the type of the container, which isolators can use to
> determine both what to do with a container and how to obtain it. For the
> Docker URL type, for example, the absence of a host between the second and
> third slashes could be interpreted to mean that the image should be fetched
> from the Docker index or from a locally configured default Docker image
> server; whereas if a hostname is given, it is treated as the image server to
> use.
> The addition of "options" to the ContainerInfo poses a risk to portability and
> warrants both explanation and justification. In the case of Docker URLs, for
> example, it is possible to mount additional filesystems on the Docker command
> line; and these filesystems can even be indicated by reference to another
> Docker container by name. Support for this feature is clearly tied to the
> Docker URL and its meaning.
> When the default isolator for a slave is specified, there may also be a
> default container specified. It is good for us, then, that the ContainerInfo
> structure maps cleanly to an array of byte strings, since this is an easy
> thing to handle from the command line.
> Now in practice, how will we use the ContainerInfo? In the three use cases
> outlined above -- service container, command container and containerized
> executor -- tasks needing a special container will specify an ExecutorInfo
> in the TaskInfo and not a bare CommandInfo. The ContainerInfo would then be
> part of the CommandInfo embedded in the ExecutorInfo.
> To consider a specific case, were the Storm framework packaged in a container,
> then the same container could be used both for Nimbus and the worker nodes:
> * Nimbus would be launched with a TaskInfo requesting the container and
> launching Nimbus.
> TaskInfo {
> executor = ExecutorInfo {
> command = CommandInfo {
> value = "python /opt/storm/bin/storm go"
> containerInfo = ContainerInfo {
> image = "docker:///storm-mesos/latest"
> options = [ "-p", "1337:8000" ]
> }
> }
> ...
> }
> ...
> }
> * Nimbus would launch executors with a TaskInfo requesting the very same
> container, but specifying a different command.
> TaskInfo {
> executor = ExecutorInfo {
> command = CommandInfo {
> value = "curl -sSfL http://storm.server:1337/conf/storm.yaml -o
> /opt/storm/conf/storm.yaml && python /opt/storm/bin/storm supervisor
> storm.mesos.MesosSupervisor"
> containerInfo = ContainerInfo {
> image = "docker:///storm-mesos/latest"
> }
> }
> ...
> }
> ...
> }
> While in the near term we expect container URLs to be pretty specific to the
> containerization mechanism, let us hope for a glorious future with URLs like
> `img:///ubuntu-13.04` that point to well-known, portable images.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)