[
https://issues.apache.org/jira/browse/MESOS-816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880586#comment-13880586
]
Niklas Quarfot Nielsen commented on MESOS-816:
----------------------------------------------
Here is a preliminary (work-in-progress) arch document on the pluggable
containerizer:
https://docs.google.com/document/d/1oO0oDmCphku4X-CO0Mja_QeH-LuHeWySHOcxMbtvLDg/edit?usp=sharing
We will follow up with an open review request in a couple of days with the
current implementation. It is based of Ians' patches and will need to follow
whatever changes that follows. Feel free to add comments and/or suggestions to
changes.
> Allow delegation to shell scripts for isolation
> -----------------------------------------------
>
> Key: MESOS-816
> URL: https://issues.apache.org/jira/browse/MESOS-816
> Project: Mesos
> Issue Type: Improvement
> Components: isolation, slave
> Reporter: Jason Dusek
> Priority: Minor
> Attachments: mesos-shell-isolator.jpg
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> Being able to delegate isolation to shell scripts could make it easier to
> leverage the machinery provided by the LXC tools, LibVirt, VirtualBox, Docker
> and similar containerization systems.
> Why go through command line tools for isolation? We have seen many requests
> for isolation, covering a wide variety of scenarios:
> - Setups requiring multiple versions of the same language (Ruby 1.8, Ruby
> 1.9).
> - Setups requiring installation and configuration of RPM-packaged
> applications.
> - Build-and-test setups, where sharing the environment of the host would
> impact reproducibility.
> - Integration of 3rd party, service-oriented applications.
> - Launching applications with Docker.
> - Launching multiple instances of a Mesos framework that, like Hadoop, has
> significant system setup and dependencies.
> To cover these and other use cases, it seems reasonable to allow Mesos to
> delegate to external programs for isolation:
> - It makes it easier to experiment with new containerization tools.
> - It allows for site administrators to customize containerization, or even
> implement new containerization mechanisms, without impacting their ability to
> keep pace with Mesos development.
> - Many external programs exist for containerization -- Docker, LXC tools,
> LibVirt -- which handle a great deal of the book-keeping around finding and
> efficiently cloning disk images and setting up the guest system (its
> hostname, TTYs, /dev/*, /proc).
> The scenarios listed above can be understood in terms of three use cases:
> - The containerized system service scenario, wherein an application,
> installed with RPM or a similar tool, is started and managed by the init
> system within a container. Percona MySQL is an example of such an application.
> - The containerized application scenario, wherein an application is installed
> or unpacked and then configured and launched in a single command. For
> example, running a custom Rails app with bundle install && bundle exec rails.
> - The containerized framework/executor scenario, wherein the application is
> Spark, Hadoop or another Mesos framework/executor pair.
> One way to achieve this could be to introduce an External Isolator, which
> works in parallel with the existing process/posix and cgroups isolators. The
> responsibility of this isolator would be to act as a thin layer to external
> isolators. Calls for task launching, stopping or any other resource change
> would be serialized and passed to the external isolators by the Mesos
> External Isolator.
> Allowing for pluggable isolators invites the possibility of having different
> isolators per task. For applications using containers, it's reasonable that
> each application or framework can specify a different base image; and this
> would be an option passed to the corresponding isolator. One can also imagine
> specialized frameworks that need to disable isolation entirely. For example,
> a "system backup" framework that would specify a null isolator to allow it to
> snapshot interesting data on each slave and transfer it to a sanctioned
> storage location.
> However, for users and framework authors to specify isolators would both be
> harmful to portability and would make isolation their problem, no longer
> something handled transparently by Mesos. Furthermore, it would have the
> unintended effect of putting them at odds with site administrators, who would
> also specify isolators -- as a command line option for each slave.
> Allowing tasks to carry a more abstract notion of "container" with them would
> allow for most application level scenarios we've outlined above.
> Theoretically, more than one isolator might be able to handle a given
> container. For example if, the container is specified as an "ISO" and a
> distro LiveCD is provided, one could imagine a Docker isolator, LXC isolator
> or Virtualbox isolator handling it. Encouraging users and framework authors
> to specify a container would be simpler for them than specifying isolator
> flags, allows them to more clearly document their intent, and reduces the
> scope for conflict with other parties who have an interest in upgrading and
> tuning isolation. It also makes applications and command examples more
> portable, by decoupling the isolation mechanism from the desired container
> layout (which is, more or less, a chroot with some files in it).
> To this end, we propose adding an optional ContainerInfo to each CommandInfo:
> message CommandInfo {
> message ContainerInfo {
> required bytes image = 1;
> repeated bytes options = 2;
> }
> ...
> optional ContainerInfo container = 4;
> }
> The first field of the ContainerInfo should indicate the image, perhaps as a
> URL. For example:
> docker:///johncosta/redis
> iso+http://mirrors.kernel.org/knoppix/KNOPPIX_V7.2.0CD-2013-06-16-EN.iso
> lxc:///ubuntu
> The scheme of the URL -- recognizable as a string of letters and digits and
> perhaps plusses, dots and dashes preceding the first `://`, per RFC 3986 --
> serves to indicate the type of the container, which isolators can use to
> determine both what to do with a container and how to obtain it. For the
> Docker URL type, for example, the absence of a host between the second and
> third slashes could be interpreted to mean that the image should be fetched
> from the Docker index or from a locally configured default Docker image
> server; whereas if a hostname is given, it is treated as the image server to
> use.
> The addition of "options" to the ContainerInfo poses a risk to portability
> and warrants both explanation and justification. In the case of Docker URLs,
> for example, it is possible to mount additional filesystems on the Docker
> command line; and these filesystems can even be indicated by reference to
> another Docker container by name. Support for this feature is clearly tied to
> the Docker URL and its meaning.
> When the default isolator for a slave is specified, there may also be a
> default container specified. It is good for us, then, that the ContainerInfo
> structure maps cleanly to an array of byte strings, since this is an easy
> thing to handle from the command line.
> Now in practice, how will we use the ContainerInfo? In the three use cases
> outlined above -- service container, command container and containerized
> executor -- tasks needing a special container will specify an ExecutorInfo in
> the TaskInfo and not a bare CommandInfo. The ContainerInfo would then be part
> of the CommandInfo embedded in the ExecutorInfo.
> To consider a specific case, were the Storm framework packaged in a
> container, then the same container could be used both for Nimbus and the
> worker nodes:
> * Nimbus would be launched with a TaskInfo requesting the container and
> launching Nimbus.
> TaskInfo {
> executor = ExecutorInfo {
> command = CommandInfo {
> value = "python /opt/storm/bin/storm go"
> containerInfo = ContainerInfo {
> image = "docker:///storm-mesos/latest"
> options = [ "-p", "1337:8000" ]
> }
> }
> ...
> }
> ...
> }
> * Nimbus would launch executors with a TaskInfo requesting the very same
> container, but specifying a different command.
> TaskInfo {
> executor = ExecutorInfo {
> command = CommandInfo {
> value = "curl -sSfL http://storm.server:1337/conf/storm.yaml -o
> /opt/storm/conf/storm.yaml && python /opt/storm/bin/storm supervisor
> storm.mesos.MesosSupervisor"
> containerInfo = ContainerInfo {
> image = "docker:///storm-mesos/latest"
> }
> }
> ...
> }
> ...
> }
> While in the near term we expect container URLs to be pretty specific to the
> containerization mechanism, let us hope for a glorious future with URLs like
> `img:///ubuntu-13.04` that point to well-known, portable images.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)