Repository: mesos Updated Branches: refs/heads/master 0e2301e0f -> 0f348c411
Added External Containerizer documentation. Adds a markdown document describing the ExternalContainerizer. Review: https://reviews.apache.org/r/22169/ Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/0f348c41 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/0f348c41 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/0f348c41 Branch: refs/heads/master Commit: 0f348c411a301bee4b6af56d0e68120b064d6a2a Parents: 0e2301e Author: Till Toenshoff <[email protected]> Authored: Wed Nov 5 17:57:28 2014 +0100 Committer: Till Toenshoff <[email protected]> Committed: Wed Nov 5 18:13:12 2014 +0100 ---------------------------------------------------------------------- docs/external-containerizer.md | 494 ++++++++++++++++++++++++++++++ docs/home.md | 1 + docs/images/ec_kill_seqdiag.png | Bin 0 -> 11631 bytes docs/images/ec_launch_seqdiag.png | Bin 0 -> 24424 bytes docs/images/ec_lifecycle_seqdiag.png | Bin 0 -> 15288 bytes docs/images/ec_orphan_seqdiag.png | Bin 0 -> 20594 bytes docs/images/ec_recover_seqdiag.png | Bin 0 -> 25529 bytes 7 files changed, 495 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/0f348c41/docs/external-containerizer.md ---------------------------------------------------------------------- diff --git a/docs/external-containerizer.md b/docs/external-containerizer.md new file mode 100644 index 0000000..0594bee --- /dev/null +++ b/docs/external-containerizer.md @@ -0,0 +1,494 @@ +--- +layout: documentation +--- + +# External Containerizer + + +* EC = external containerizer. A part of the mesos slave that provides +an API for containerizing via external plugin executables. +* ECP = external containerizer program. An external plugin executable +implementing the actual containerizing by interfacing with a +containerizing system (e.g. Docker). + +# Containerizing + + +# General Overview + +EC invokes ECP as a shell process, passing the command as a parameter +to the ECP executable. Additional data is exhanged via stdin and +stdout. + +The ECP is expected to return a zero exit code for all commands it was +able to process. A non-zero status code signals an error. Below you +will find an overview of the commands that have to be implemented by +an ECP, as well as their invocation scheme. + +The ECP is expected to be using stderr for state info and displaying +additional debug information. That information is getting logged to +a file, see [Enviroment: **Sandbox**](#sandbox). + + +### Call and communication scheme + +Interface describing the functions an ECP has to implement via +command calls. Many invocations on the ECP will also pass a +protobuf message along via stdin. Some invocations on the ECP also +expect to deliver a result protobuf message back via stdout. +All protobuf messages are prefixed by their original length - +this is sometimes referred to as âRecord-IOâ-format. See +[Record-IO De/Serializing Example](#record-io-deserializing-example). + +**COMMAND < INPUT-PROTO > RESULT-PROTO** + +* `launch < containerizer::Launch` +* `update < containerizer::Update` +* `usage < containerizer::Usage > mesos::ResourceStatistics` +* `wait < containerizer::Wait > containerizer::Termination` +* `destroy < containerizer::Destroy` +* `containers > containerizer::Containers` +* `recover` + + +# Command Ordering + +## Make no assumptions +Commands may pretty much come in any order. There is only one +exception to this rule; when launching a task, the EC will make sure +that the ECP first receives a `launch` on that specific container, all +other commands are queued until `launch` returns from the ECP. + + +# Use Cases + +## Task Launching EC / ECP Overview + +* EC invokes `launch` on the ECP. + * Along with that call, the ECP will receive a containerizer::Launch + protobuf message via stdin. + * ECP now makes sure the executor gets started. +**Note** that `launch` is not supposed to block. It should return +immediately after triggering the executor/command - that could be done +via fork-exec within the ECP. +* EC invokes `wait` on the ECP. + * Along with that call, the ECP will receive a containerizer::Wait + protobuf message via stdin. + * ECP now blocks until the launched command is reaped - that could be + implemented via waitpid within the ECP. + * Once the command is reaped, the ECP should deliver a + containerizer::Termination protobuf message via stdout, back to the + EC. + + +## Container Lifecycle Sequence Diagrams + + +### Container Launching + +A container is in a staging state and now gets started and observed +until it gets into a final state. + + + +### Container Running + +A container has gotten launched at some point and now is considered +being in a non terminal state by the slave. The following commands +will get triggered multiple times at the ECP over the lifetime of a +container. Their order however is not determined. + + + +### Resource Limitation + +While a container is active, a resource limitation was identified +(e.g. out of memory) by the ECP isolation mechanism of choice. + + + +## Slave Recovery Overview + +* Slave recovers via check pointed state. +* EC invokes `recover` on the ECP - there is no protobuf message sent +or expected as a result from this command. + * The ECP may try to recover internal states via its own failover +mechanisms, if needed. +* After `recover` returns, the EC will invoke `containers` on the ECP. + * The ECP should return Containers which is a list of currently + active containers. +**Note** these containers are known to the ECP but might in fact +partially be unknown to the slave (e.g. slave failed after launch but +before or within wait) - those containers are considered to be +orphans. +* The EC now compares the list of slave known containers to those +listed within `Containers`. For each orphan it identifies, the slave +will invoke a `wait` followed by a `destroy` on the ECP for those +containers. +* Slave will now call `wait` on the ECP (via EC) for all recovered +containers. This does once again put âwait' into the position of the +ultimate command reaper. + + +## Slave Recovery Sequence Diagram + +### Recovery + +While containers are active, the slave fails over. + + + +### Orphan Destruction + +Containers identified by the ECP as being active but not slave state +recoverable are getting terminated. + + + + +# Command Details + +## launch +### Start the containerized executor + +Hands over all information the ECP needs for launching a task +via an executor. +This call should not wait for the executor/command to return. The +actual reaping of the containerized command is done via the `wait` +call. + + launch < containerizer::Launch + +This call receives the containerizer::Launch protobuf via stdin. + + /** + * Encodes the launch command sent to the external containerizer + * program. + */ + message Launch { + required ContainerID container_id = 1; + optional TaskInfo task_info = 2; + optional ExecutorInfo executor_info = 3; + optional string directory = 4; + optional string user = 5; + optional SlaveID slave_id = 6; + optional string slave_pid = 7; + optional bool checkpoint = 8; + } + +This call does not return any data via stdout. + +## wait +### Gets information on the containerized executor's Termination + +Is expected to reap the executor/command. This call should block +until the executor/command has terminated. + + wait < containerizer::Wait > containerizer::Termination + +This call receives the containerizer::Wait protobuf via stdin. + + /** + * Encodes the wait command sent to the external containerizer + * program. + */ + message Wait { + required ContainerID container_id = 1; + } + +This call is expected to return containerizer::Termination via stdout. + + /** + * Information about a container termination, returned by the + * containerizer to the slave. + */ + message Termination { + // A container may be killed if it exceeds its resources; this will + // be indicated by killed=true and described by the message string. + required bool killed = 1; + required string message = 2; + + // Exit status of the process. + optional int32 status = 3; + } + +The Termination attribute `killed` is to be set only when the +containerizer or the underlying isolation had to enforce a limitation +by killing the task (e.g. task exceeded suggested memory limit). + +## update +### Updates the container's resource limits + +Is sending (new) resource constraints for the given container. +Resource constraints onto a container may vary over the lifetime of +the containerized task. + + update < containerizer::Update + +This call receives the containerizer::Update protobuf via stdin. + + /** + * Encodes the update command sent to the external containerizer + * program. + */ + message Update { + required ContainerID container_id = 1; + repeated Resource resources = 2; + } + +This call does not return any data via stdout. + +## usage +### Gathers resource usage statistics for a containerized task +Is used for polling the current resource uses for the given container. + + usage < containerizer::Usage > mesos::ResourceStatistics + +This call received the containerizer::Usage protobuf via stdin. + + /** + * Encodes the usage command sent to the external containerizer + * program. + */ + message Usage { + required ContainerID container_id = 1; + } + +This call is expected to return mesos::ResourceStatistics via stdout. + + /* + * A snapshot of resource usage statistics. + */ + message ResourceStatistics { + required double timestamp = 1; // Snapshot time, in seconds since the Epoch. + + // CPU Usage Information: + // Total CPU time spent in user mode, and kernel mode. + optional double cpus_user_time_secs = 2; + optional double cpus_system_time_secs = 3; + + // Number of CPUs allocated. + optional double cpus_limit = 4; + + // cpu.stat on process throttling (for contention issues). + optional uint32 cpus_nr_periods = 7; + optional uint32 cpus_nr_throttled = 8; + optional double cpus_throttled_time_secs = 9; + + // Memory Usage Information: + optional uint64 mem_rss_bytes = 5; // Resident Set Size. + + // Amount of memory resources allocated. + optional uint64 mem_limit_bytes = 6; + + // Broken out memory usage information (files, anonymous, and mmaped files) + optional uint64 mem_file_bytes = 10; + optional uint64 mem_anon_bytes = 11; + optional uint64 mem_mapped_file_bytes = 12; + } + +## destroy +### Terminates the containerized executor + +Is used in rare situations, like for graceful slave shutdown +but also in slave fail over scenarios - see Slave Recovery for more. + + destroy < containerizer::Destroy + +This call receives the containerizer::Destroy protobuf via stdin. + + /** + * Encodes the destroy command sent to the external containerizer + * program. + */ + message Destroy { + required ContainerID container_id = 1; + } + +This call does not return any data via stdout. + +## containers +### Gets all active container-id's + +Returns all container identifiers known to be currently active. + + containers > containerizer::Containers + +This call does not receive any additional data via stdin. + +This call is expected to pass containerizer::Containers back via +stdout. + + /** + * Information on all active containers returned by the containerizer + * to the slave. + */ + message Containers { + repeated ContainerID containers = 1; + } + + +## recover +### Internal ECP state recovery + +Allows the ECP to do a state recovery on its own. If the ECP +uses state check-pointing e.g. via file system, then this call would +be a good moment to de-serialize that state information. Make sure you +also see [Slave Recovery Overview](#slave-recovery-overview) for more. + + recover + +This call does not receive any additional data via stdin. +No returned data via stdout. + + + +### Protobuf Message Definitions + +For possibly more up-to-date versions of the above mentioned protobufs +as well as protobuf messages referenced by them, please check: + +* containerizer::XXX are defined within + include/mesos/containerizer/containerizer.proto. + +* mesos::XXX are defined within include/mesos/mesos.proto. + + + +# Environment + +## **Sandbox** + +A sandbox environment is formed by `cd` into the work-directory of the +executor as well as a stderr redirect into the executor's "stderr" +log-file. +**Note** not **all** invocations have a complete sandbox environment. + + +## Addional Environment Variables + +Additionally, there are a few new environment variables set when +invoking the ECP. + + +* MESOS_LIBEXEC_DIRECTORY = path to mesos-executor, mesos-usage, ... +This information is always present. + +* MESOS_WORK_DIRECTORY = slave work directory. This should be used for +distinguishing slave instances. +This information is always present. + +**Note** that this is specifically helpful for being able to tie a set +of containers to a specific slave instance, thus allowing proper +recovery when needed. + +* MESOS_DEFAULT_CONTAINER_IMAGE = default image as provided via slave +flags (default_container_image). This variable is provided only in +calls to `launch`. + + + +# Debugging + +## Enhanced Verbosity Logging + +For receiving an increased level of status information from the EC +use the GLOG verbosity level. Prefix your mesos startup call by +setting the level to a value higher than or equal to two. + +`GLOG_v=2 ./bin/mesos-slave --master=[...]` + + +## ECP stderr Logging + +All output to stderr of your ECP will get logged to the executor's +'stderr' log file. +The specific location can be extracted from the [Enhanced Verbosity +Logging](#enhanced-verbosity-logging) of the EC. + +Example Log Output: + + I0603 02:12:34.165662 174215168 external_containerizer.cpp:1083] Invoking external containerizer for method 'launch' + I0603 02:12:34.165675 174215168 external_containerizer.cpp:1100] calling: [/Users/till/Development/mesos-till/build/src/test-containerizer launch] + I0603 02:12:34.165678 175824896 slave.cpp:497] Successfully attached file '/tmp/ExternalContainerizerTest_Launch_lP22ci/slaves/20140603-021232-16777343-51377-7591-0/frameworks/20140603-021232-16777343-51377-7591-0000/executors/1/runs/558e0a69-70da-4d71-b4c4-c2820b1d6345' + I0603 02:12:34.165686 174215168 external_containerizer.cpp:1101] directory: /tmp/ExternalContainerizerTest_Launch_lP22ci/slaves/20140603-021232-16777343-51377-7591-0/frameworks/20140603-021232-16777343-51377-7591-0000/executors/1/runs/558e0a69-70da-4d71-b4c4-c2820b1d6345 + +The stderr output of the ECP for this call is found within the stderr file located in the directory displayed in the last quoted line. + + cat /tmp/ExternalContainerizerTest_Launch_lP22ci/slaves/20140603-021232-16777343-51377-7591-0/frameworks/20140603-021232-16777343-51377-7591-0000/executors/1/runs/558e0a69-70da-4d71-b4c4-c2820b1d6345/stderr + + +# Appendix + +## Record-IO Proto Example: Launch + +This is what a properly record-io formatted protobuf looks like. + +**name: offset** + +* length: 00 - 03 = record length in byte + +* payload: 04 - (length + 4) = protobuf payload + +Example length: 00000240h = 576 byte total protobuf size + +Example Hexdump: + + 00000000: 4002 0000 0a26 0a24 3433 3532 3533 6162 2d64 3234 362d 3437 :@....&.$435253ab-d246-47 + 00000018: 6265 2d61 3335 302d 3335 3432 3034 3635 6438 3638 1a81 020a :be-a350-35420465d868.... + 00000030: 030a 0131 2a16 0a04 6370 7573 1000 1a09 0900 0000 0000 0000 :...1*...cpus............ + 00000048: 4032 012a 2a15 0a03 6d65 6d10 001a 0909 0000 0000 0000 9040 :@2.**...mem............@ + 00000060: 3201 2a2a 160a 0464 6973 6b10 001a 0909 0000 0000 0000 9040 :2.**...disk............@ + 00000078: 3201 2a2a 180a 0570 6f72 7473 1001 220a 0a08 0898 f201 1080 :2.**...ports.."......... + 00000090: fa01 3201 2a3a 2a1a 2865 6368 6f20 274e 6f20 7375 6368 2066 :..2.*:*.(echo 'No such f + 000000a8: 696c 6520 6f72 2064 6972 6563 746f 7279 273b 2065 7869 7420 :ile or directory'; exit + 000000c0: 3142 2b0a 2932 3031 3430 3532 362d 3031 3530 3036 2d31 3637 :1B+.)20140526-015006-167 + 000000d8: 3737 3334 332d 3535 3430 332d 3632 3536 372d 3030 3030 4a3d :77343-55403-62567-0000J= + 000000f0: 436f 6d6d 616e 6420 4578 6563 7574 6f72 2028 5461 736b 3a20 :Command Executor (Task: + 00000108: 3129 2028 436f 6d6d 616e 643a 2073 6820 2d63 2027 7768 696c :1) (Command: sh -c 'whil + 00000120: 6520 7472 7565 203b 2e2e 2e27 2952 0131 22c5 012f 746d 702f :e true ;...')R.1"../tmp/ + 00000138: 4578 7465 726e 616c 436f 6e74 6169 6e65 7269 7a65 7254 6573 :ExternalContainerizerTes + 00000150: 745f 4c61 756e 6368 5f6c 5855 6839 662f 736c 6176 6573 2f32 :t_Launch_lXUh9f/slaves/2 + 00000168: 3031 3430 3532 362d 3031 3530 3036 2d31 3637 3737 3334 332d :0140526-015006-16777343- + 00000180: 3535 3430 332d 3632 3536 372d 302f 6672 616d 6577 6f72 6b73 :55403-62567-0/frameworks + 00000198: 2f32 3031 3430 3532 362d 3031 3530 3036 2d31 3637 3737 3334 :/20140526-015006-1677734 + 000001b0: 332d 3535 3430 332d 3632 3536 372d 3030 3030 2f65 7865 6375 :3-55403-62567-0000/execu + 000001c8: 746f 7273 2f31 2f72 756e 732f 3433 3532 3533 6162 2d64 3234 :tors/1/runs/435253ab-d24 + 000001e0: 362d 3437 6265 2d61 3335 302d 3335 3432 3034 3635 6438 3638 :6-47be-a350-35420465d868 + 000001f8: 2a04 7469 6c6c 3228 0a26 3230 3134 3035 3236 2d30 3135 3030 :*.till2(.&20140526-01500 + 00000210: 362d 3136 3737 3733 3433 2d35 3534 3033 2d36 3235 3637 2d30 :6-16777343-55403-62567-0 + 00000228: 3a18 736c 6176 6528 3129 4031 3237 2e30 2e30 2e31 3a35 3534 ::.slave(1)@127.0.0.1:554 + 00000240: 3033 4000 + +## Record-IO De/Serializing Example +How to send and receive such record-io formatted message +using Python + +*taken from src/examples/python/test_containerizer.py* + + # Read a data chunk prefixed by its total size from stdin. + def receive(): + # Read size (uint32 => 4 bytes). + size = struct.unpack('I', sys.stdin.read(4)) + if size[0] <= 0: + print >> sys.stderr, "Expected protobuf size over stdin. " \ + "Received 0 bytes." + return "" + + # Read payload. + data = sys.stdin.read(size[0]) + if len(data) != size[0]: + print >> sys.stderr, "Expected %d bytes protobuf over stdin. " \ + "Received %d bytes." % (size[0], len(data)) + return "" + + return data + + # Write a protobuf message prefixed by its total size (aka recordio) + # to stdout. + def send(data): + # Write size (uint32 => 4 bytes). + sys.stdout.write(struct.pack('I', len(data))) + + # Write payload. + sys.stdout.write(data) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/mesos/blob/0f348c41/docs/home.md ---------------------------------------------------------------------- diff --git a/docs/home.md b/docs/home.md index 7afc4a3..33458cc 100644 --- a/docs/home.md +++ b/docs/home.md @@ -13,6 +13,7 @@ layout: documentation * [Configuration](/documentation/latest/configuration/) for command-line arguments. * [Docker Containerizer](/documentation/latest/docker-containerizer/) for launching a Docker image as a Task, or as an Executor. +* [External Containerizer](/documentation/latest/external-containerizer/) * [Framework Authorization](/documentation/latest/authorization/) * [Framework Rate Limiting](/documentation/latest/framework-rate-limiting/) * [High Availability](/documentation/latest/high-availability/) for running multiple masters simultaneously. http://git-wip-us.apache.org/repos/asf/mesos/blob/0f348c41/docs/images/ec_kill_seqdiag.png ---------------------------------------------------------------------- diff --git a/docs/images/ec_kill_seqdiag.png b/docs/images/ec_kill_seqdiag.png new file mode 100644 index 0000000..e81e517 Binary files /dev/null and b/docs/images/ec_kill_seqdiag.png differ http://git-wip-us.apache.org/repos/asf/mesos/blob/0f348c41/docs/images/ec_launch_seqdiag.png ---------------------------------------------------------------------- diff --git a/docs/images/ec_launch_seqdiag.png b/docs/images/ec_launch_seqdiag.png new file mode 100644 index 0000000..96a8af0 Binary files /dev/null and b/docs/images/ec_launch_seqdiag.png differ http://git-wip-us.apache.org/repos/asf/mesos/blob/0f348c41/docs/images/ec_lifecycle_seqdiag.png ---------------------------------------------------------------------- diff --git a/docs/images/ec_lifecycle_seqdiag.png b/docs/images/ec_lifecycle_seqdiag.png new file mode 100644 index 0000000..09005a4 Binary files /dev/null and b/docs/images/ec_lifecycle_seqdiag.png differ http://git-wip-us.apache.org/repos/asf/mesos/blob/0f348c41/docs/images/ec_orphan_seqdiag.png ---------------------------------------------------------------------- diff --git a/docs/images/ec_orphan_seqdiag.png b/docs/images/ec_orphan_seqdiag.png new file mode 100644 index 0000000..670f872 Binary files /dev/null and b/docs/images/ec_orphan_seqdiag.png differ http://git-wip-us.apache.org/repos/asf/mesos/blob/0f348c41/docs/images/ec_recover_seqdiag.png ---------------------------------------------------------------------- diff --git a/docs/images/ec_recover_seqdiag.png b/docs/images/ec_recover_seqdiag.png new file mode 100644 index 0000000..18dabcc Binary files /dev/null and b/docs/images/ec_recover_seqdiag.png differ
