Modified: mesos/site/source/documentation/latest.html.md URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest.html.md?rev=1638021&r1=1638020&r2=1638021&view=diff ============================================================================== --- mesos/site/source/documentation/latest.html.md (original) +++ mesos/site/source/documentation/latest.html.md Tue Nov 11 04:11:00 2014 @@ -13,6 +13,7 @@ layout: documentation * [Configuration](/documentation/latest/configuration/) for command-line arguments. * [Docker Containerizer](/documentation/latest/docker-containerizer/) for launching a Docker image as a Task, or as an Executor. +* [External Containerizer](/documentation/latest/external-containerizer/) * [Framework Authorization](/documentation/latest/authorization/) * [Framework Rate Limiting](/documentation/latest/framework-rate-limiting/) * [High Availability](/documentation/latest/high-availability/) for running multiple masters simultaneously.
Modified: mesos/site/source/documentation/latest/configuration.md URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/configuration.md?rev=1638021&r1=1638020&r2=1638021&view=diff ============================================================================== --- mesos/site/source/documentation/latest/configuration.md (original) +++ mesos/site/source/documentation/latest/configuration.md Tue Nov 11 04:11:00 2014 @@ -2,6 +2,7 @@ layout: documentation --- + # Mesos Configuration The Mesos master and slave can take a variety of configuration options through command-line arguments, or environment variables. A list of the available options can be seen by running `mesos-master --help` or `mesos-slave --help`. Each option can be set in two ways: @@ -19,384 +20,1401 @@ If you have special compilation requirem *These options can be supplied to both masters and slaves.* -``` - --ip=VALUE IP address to listen on - - --[no-]help Prints this help message (default: false) - - --log_dir=VALUE Location to put log files (no default, nothing - is written to disk unless specified; - does not affect logging to stderr) - - --logbufsecs=VALUE How many seconds to buffer log messages for (default: 0) - - --logging_level=VALUE Log message at or above this level; possible values: - 'INFO', 'WARNING', 'ERROR'; if quiet flag is used, this - will affect just the logs from log_dir (if specified) (default: INFO) - - --port=VALUE Port to listen on (master default: 5050 and slave default: 5051) - - --[no-]quiet Disable logging to stderr (default: false) - - --[no-]version Show version and exit. (default: false) -``` +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + + <tr> + <td> + --ip=VALUE + </td> + <td> + IP address to listen on + + </td> + </tr> + <tr> + <td> + --[no-]help + </td> + <td> + Prints this help message (default: false) + + </td> + </tr> + <tr> + <td> + --[no-]initialize_driver_logging + </td> + <td> + Whether to automatically initialize google logging of scheduler + and/or executor drivers. (default: true) + + </td> + </tr> + <tr> + <td> + --log_dir=VALUE + </td> + <td> + Location to put log files (no default, nothing + is written to disk unless specified; + does not affect logging to stderr) + + </td> + </tr> + <tr> + <td> + --logbufsecs=VALUE + </td> + <td> + How many seconds to buffer log messages for (default: 0) + + </td> + </tr> + <tr> + <td> + --logging_level=VALUE + </td> + <td> + Log message at or above this level; possible values: + 'INFO', 'WARNING', 'ERROR'; if quiet flag is used, this + will affect just the logs from log_dir (if specified) (default: INFO) + + </td> + </tr> + <tr> + <td> + --port=VALUE + </td> + <td> + Port to listen on (master default: 5050 and slave default: 5051) + + </td> + </tr> + <tr> + <td> + --[no-]quiet + </td> + <td> + Disable logging to stderr (default: false) + + </td> + </tr> + <tr> + <td> + --[no-]version + </td> + <td> + Show version and exit. (default: false) +</table> ## Master Options *Required Flags* -``` - --quorum=VALUE The size of the quorum of replicas when using 'replicated_log' based - registry. It is imperative to set this value to be a majority of - masters i.e., quorum > (number of masters)/2. - - --work_dir=VALUE Where to store the persistent information stored in the Registry. - - --zk=VALUE ZooKeeper URL (used for leader election amongst masters) - May be one of: - zk://host1:port1,host2:port2,.../path - zk://username:password@host1:port1,host2:port2,.../path - file://path/to/file (where file contains one of the above) -``` +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + <tr> + <td> + --quorum=VALUE + </td> + <td> + The size of the quorum of replicas when using 'replicated_log' based + registry. It is imperative to set this value to be a majority of + masters i.e., quorum > (number of masters)/2. + + </td> + </tr> + <tr> + <td> + --work_dir=VALUE + </td> + <td> + Where to store the persistent information stored in the Registry. + + </td> + </tr> + <tr> + <td> + --zk=VALUE + </td> + <td> + ZooKeeper URL (used for leader election amongst masters) + May be one of: +<pre><code>zk://host1:port1,host2:port2,.../path +zk://username:password@host1:port1,host2:port2,.../path +file://path/to/file (where file contains one of the above)</code></pre> + </td> + </tr> +</table> *Optional Flags* -``` - --allocation_interval=VALUE Amount of time to wait between performing - (batch) allocations (e.g., 500ms, 1sec, etc). (default: 1secs) - --[no-]authenticate If authenticate is 'true' only authenticated frameworks are allowed - to register. If 'false' unauthenticated frameworks are also - allowed to register. (default: false) - --[no-]authenticate_slaves If 'true' only authenticated slaves are allowed to register. - If 'false' unauthenticated slaves are also allowed to register. (default: false) - --cluster=VALUE Human readable name for the cluster, - displayed in the webui. - --credentials=VALUE Path to a file with a list of credentials. - Each line contains 'principal' and 'secret' separated by whitespace. - Path could be of the form 'file:///path/to/file' or '/path/to/file'. - --framework_sorter=VALUE Policy to use for allocating resources - between a given user's frameworks. Options - are the same as for user_allocator. (default: drf) - --hostname=VALUE The hostname the master should advertise in ZooKeeper. - If left unset, the hostname is resolved from the IP address that the master binds to. - --[no-]log_auto_initialize Whether to automatically initialize the replicated log used for the - registry. If this is set to false, the log has to be manually - initialized when used for the very first time. (default: true) - - --recovery_slave_removal_limit=VALUE For failovers, limit on the percentage of slaves that can be removed - from the registry *and* shutdown after the re-registration timeout - elapses. If the limit is exceeded, the master will fail over rather - than remove the slaves. - This can be used to provide safety guarantees for production - environments. Production environments may expect that across Master - failovers, at most a certain percentage of slaves will fail - permanently (e.g. due to rack-level failures). - Setting this limit would ensure that a human needs to get - involved should an unexpected widespread failure of slaves occur - in the cluster. - Values: [0%-100%] (default: 100%) - - --registry=VALUE Persistence strategy for the registry; - available options are 'replicated_log', 'in_memory' (for testing). (default: replicated_log) - - --registry_fetch_timeout=VALUE Duration of time to wait in order to fetch data from the registry - after which the operation is considered a failure. (default: 1mins) - - --registry_store_timeout=VALUE Duration of time to wait in order to store data in the registry - after which the operation is considered a failure. (default: 5secs) - - --[no-]registry_strict Whether the Master will take actions based on the persistent - information stored in the Registry. Setting this to false means - that the Registrar will never reject the admission, readmission, - or removal of a slave. Consequently, 'false' can be used to - bootstrap the persistent state on a running cluster. - NOTE: This flag is *experimental* and should not be used in - production yet. (default: false) - - --roles=VALUE A comma separated list of the allocation - roles that frameworks in this cluster may - belong to. - - --[no-]root_submissions Can root submit frameworks? (default: true) - - --slave_reregister_timeout=VALUE The timeout within which all slaves are expected to re-register - when a new master is elected as the leader. Slaves that do not - re-register within the timeout will be removed from the registry - and will be shutdown if they attempt to communicate with master. - NOTE: This value has to be atleast 10mins. (default: 10mins) - - --user_sorter=VALUE Policy to use for allocating resources - between users. May be one of: - dominant_resource_fairness (drf) (default: drf) - - --webui_dir=VALUE Location of the webui files/assets (default: /usr/local/share/mesos/webui) - - --weights=VALUE A comma separated list of role/weight pairs - of the form 'role=weight,role=weight'. Weights - are used to indicate forms of priority. - - --whitelist=VALUE Path to a file with a list of slaves - (one per line) to advertise offers for. - Path could be of the form 'file:///path/to/file' or '/path/to/file'. (default: *) - - --zk_session_timeout=VALUE ZooKeeper session timeout. (default: 10secs) -``` +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + <tr> + <td> + --acls=VALUE + </td> + <td> + The value could be a JSON formatted string of ACLs + or a file path containing the JSON formatted ACLs used + for authorization. Path could be of the form <code>file:///path/to/file</code> + or <code>/path/to/file</code>. + <p/> + See the ACLs protobuf in mesos.proto for the expected format. + <p/> + JSON file example: +<pre><code>{ + "register_frameworks": [ + { + "principals": { "type": "ANY" }, + "roles": { "values": ["a"] } + } + ], + "run_tasks": [ + { + "principals": { "values": ["a", "b"] }, + "users": { "values": ["c"] } + } + ], + "shutdown_frameworks": [ + { + "principals": { "values": ["a", "b"] }, + "framework_principals": { "values": ["c"] } + } + ] +}</code></pre> + </td> + </tr> + <tr> + <td> + --allocation_interval=VALUE + </td> + <td> + Amount of time to wait between performing + (batch) allocations (e.g., 500ms, 1sec, etc). (default: 1secs) + </td> + </tr> + <tr> + <td> + --[no-]authenticate + </td> + <td> + If authenticate is 'true' only authenticated frameworks are allowed + to register. If 'false' unauthenticated frameworks are also + allowed to register. (default: false) + </td> + </tr> + <tr> + <td> + --[no-]authenticate_slaves + </td> + <td> + If 'true' only authenticated slaves are allowed to register. + <p/> + If 'false' unauthenticated slaves are also allowed to register. (default: false) + </td> + </tr> + <tr> + <td> + --authenticators=VALUE + </td> + <td> + Authenticator implementation to use when authenticating frameworks + and/or slaves. Use the default <code>crammd5</code>, or + load an alternate authenticator module using <code>--modules</code>. (default: crammd5) + </td> + </tr> + <tr> + <td> + --cluster=VALUE + </td> + <td> + Human readable name for the cluster, + displayed in the webui. + </td> + </tr> + <tr> + <td> + --credentials=VALUE + </td> + <td> + Either a path to a text file with a list of credentials, + each line containing 'principal' and 'secret' separated by whitespace, + or, a path to a JSON-formatted file containing credentials. + Path could be of the form <code>file:///path/to/file</code> or <code>/path/to/file</code>. + <p/> + JSON file Example: +<pre><code>{ + "credentials": [ + { + "principal": "sherman", + "secret": "kitesurf" + } + ] +}</code></pre> + + <p/> + Text file Example: +<pre><code> username secret </code></pre> + + </td> + </tr> + <tr> + <td> + --framework_sorter=VALUE + </td> + <td> + Policy to use for allocating resources + between a given user's frameworks. Options + are the same as for user_allocator. (default: drf) + </td> + </tr> + <tr> + <td> + --hostname=VALUE + </td> + <td> + The hostname the master should advertise in ZooKeeper. + If left unset, the hostname is resolved from the IP address + that the master binds to. + </td> + </tr> + <tr> + <td> + --[no-]log_auto_initialize + </td> + <td> + Whether to automatically initialize the replicated log used for the + registry. If this is set to false, the log has to be manually + initialized when used for the very first time. (default: true) + </td> + </tr> + <tr> + <td> + --modules=VALUE + </td> + <td> + List of modules to be loaded and be available to the internal + subsystems. + <p/> + Use <code>--modules=filepath</code> to specify the list of modules via a + file containing a JSON formatted string. 'filepath' can be + of the form <code>file:///path/to/file</code> or <code>/path/to/file</code>. + <p/> + Use <code>--modules="{...}"</code> to specify the list of modules inline. + <p/> + JSON file example: +<pre><code>{ + "libraries": [ + { + "file": "/path/to/libfoo.so", + "modules": [ + { + "name": "org_apache_mesos_bar", + "parameters": [ + { + "key": "X", + "value": "Y" + } + ] + }, + { + "name": "org_apache_mesos_baz" + } + ] + }, + { + "name": "qux", + "modules": [ + { + "name": "org_apache_mesos_norf" + } + ] + } + ] +}</code></pre> + </td> + </tr> + <tr> + <td> + --offer_timeout=VALUE + </td> + <td> + Duration of time before an offer is rescinded from a framework. + <p/> + This helps fairness when running frameworks that hold on to offers, + or frameworks that accidentally drop offers. + + </td> + </tr> + <tr> + <td> + --rate_limits=VALUE + </td> + <td> + The value could be a JSON formatted string of rate limits + or a file path containing the JSON formatted rate limits used + for framework rate limiting. + <p/> + Path could be of the form <code>file:///path/to/file</code> + or <code>/path/to/file</code>. + <p/> + + See the RateLimits protobuf in mesos.proto for the expected format. + <p/> + + Example: +<pre><code>{ + "limits": [ + { + "principal": "foo", + "qps": 55.5 + }, + { + "principal": "bar" + } + ], + "aggregate_default_qps": 33.3 +}</code></pre> + </td> + </tr> + <tr> + <td> + --recovery_slave_removal_limit=VALUE + </td> + <td> + For failovers, limit on the percentage of slaves that can be removed + from the registry *and* shutdown after the re-registration timeout + elapses. If the limit is exceeded, the master will fail over rather + than remove the slaves. + <p/> + This can be used to provide safety guarantees for production + environments. Production environments may expect that across Master + failovers, at most a certain percentage of slaves will fail + permanently (e.g. due to rack-level failures). + <p/> + Setting this limit would ensure that a human needs to get + involved should an unexpected widespread failure of slaves occur + in the cluster. + <p/> + Values: [0%-100%] (default: 100%) + </td> + </tr> + <tr> + <td> + --registry=VALUE + </td> + <td> + Persistence strategy for the registry; + <p/> + available options are 'replicated_log', 'in_memory' (for testing). (default: replicated_log) + </td> + </tr> + <tr> + <td> + --registry_fetch_timeout=VALUE + </td> + <td> + Duration of time to wait in order to fetch data from the registry + after which the operation is considered a failure. (default: 1mins) + </td> + </tr> + <tr> + <td> + --registry_store_timeout=VALUE + </td> + <td> + Duration of time to wait in order to store data in the registry + after which the operation is considered a failure. (default: 5secs) + </td> + </tr> + <tr> + <td> + --[no-]registry_strict + </td> + <td> + Whether the Master will take actions based on the persistent + information stored in the Registry. Setting this to false means + that the Registrar will never reject the admission, readmission, + or removal of a slave. Consequently, 'false' can be used to + bootstrap the persistent state on a running cluster. + <p/> + NOTE: This flag is *experimental* and should not be used in + production yet. (default: false) + </td> + </tr> + <tr> + <td> + --roles=VALUE + </td> + <td> + A comma separated list of the allocation + roles that frameworks in this cluster may + belong to. + </td> + </tr> + <tr> + <td> + --[no-]root_submissions + </td> + <td> + Can root submit frameworks? (default: true) + </td> + </tr> + <tr> + <td> + --slave_reregister_timeout=VALUE + </td> + <td> + The timeout within which all slaves are expected to re-register + when a new master is elected as the leader. Slaves that do not + re-register within the timeout will be removed from the registry + and will be shutdown if they attempt to communicate with master. + <p/> + NOTE: This value has to be atleast 10mins. (default: 10mins) + </td> + </tr> + <tr> + <td> + --user_sorter=VALUE + </td> + <td> + Policy to use for allocating resources + between users. May be one of: + <p/> + dominant_resource_fairness (drf) (default: drf) + </td> + </tr> + <tr> + <td> + --webui_dir=VALUE + </td> + <td> + Directory path of the webui files/assets (default: /usr/local/share/mesos/webui) + </td> + </tr> + <tr> + <td> + --weights=VALUE + </td> + <td> + A comma separated list of role/weight pairs + of the form 'role=weight,role=weight'. Weights + are used to indicate forms of priority. + </td> + </tr> + <tr> + <td> + --whitelist=VALUE + </td> + <td> + Path to a file with a list of slaves + (one per line) to advertise offers for. + <p/> + Path could be of the form <code>file:///path/to/file</code> or <code>/path/to/file</code>. (default: *) + </td> + </tr> + <tr> + <td> + --zk_session_timeout=VALUE + </td> + <td> + ZooKeeper session timeout. (default: 10secs) + </td> + </tr> +</table> ## Slave Options *Required Flags* -``` - --master=VALUE May be one of: - zk://host1:port1,host2:port2,.../path - zk://username:password@host1:port1,host2:port2,.../path - file://path/to/file (where file contains one of the above) -``` +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + <tr> + <td> + --master=VALUE + </td> + <td> + This specifies how to connect to a master or a quorum of masters. This flag works with 3 different techniques. It may be one of: + <ol> + <li> hostname or ip to a master or comma-delimited list of masters, e.g., +<pre><code>--master=localhost:5050 +--master=10.0.0.5:5050,10.0.0.6:5050 +</code></pre> + </li> + + <li> zookeeper or quorum hostname/ip + port + master registration path </li> +<pre><code>--master=zk://host1:port1,host2:port2,.../path +--master=zk://username:password@host1:port1,host2:port2,.../path +</code></pre> + </li> + + <li> a path to a file containing either one of the above options </li> +<pre><code> --master=file://path/to/file (where file contains one of the above)</code></pre> + </li> + </ol> + Examples: + + </td> + </tr> +</table> *Optional Flags* -``` - --attributes=VALUE Attributes of machine - - --[no-]cgroups_enable_cfs Cgroups feature flag to enable hard limits on CPU resources - via the CFS bandwidth limiting subfeature. - (default: false) - - --cgroups_hierarchy=VALUE The path to the cgroups hierarchy root - (default: /sys/fs/cgroup) - - --cgroups_root=VALUE Name of the root cgroup - (default: mesos) - - --cgroups_subsystems=VALUE This flag has been deprecated and is no longer used, - please update your flags - - --[no-]checkpoint Whether to checkpoint slave and frameworks information - to disk. This enables a restarted slave to recover - status updates and reconnect with (--recover=reconnect) or - kill (--recover=cleanup) old executors (default: true) - - --containerizer_path=VALUE The path to the external containerizer executable used when - external isolation is activated (--isolation=external). - - --credential=VALUE Path to a file containing a single line with - the 'principal' and 'secret' separated by whitespace. - Path could be of the form 'file:///path/to/file' or '/path/to/file' - - --default_container_image=VALUE The default container image to use if not specified by a task, - when using external containerizer - - --default_role=VALUE Any resources in the --resources flag that - omit a role, as well as any resources that - are not present in --resources but that are - automatically detected, will be assigned to - this role. (default: *) - - --disk_watch_interval=VALUE Periodic time interval (e.g., 10secs, 2mins, etc) - to check the disk usage (default: 1mins) - - --executor_registration_timeout=VALUE Amount of time to wait for an executor - to register with the slave before considering it hung and - shutting it down (e.g., 60secs, 3mins, etc) (default: 1mins) - - --executor_shutdown_grace_period=VALUE Amount of time to wait for an executor - to shut down (e.g., 60secs, 3mins, etc) (default: 5secs) - - --frameworks_home=VALUE Directory prepended to relative executor URIs (default: ) - - --gc_delay=VALUE Maximum amount of time to wait before cleaning up - executor directories (e.g., 3days, 2weeks, etc). - Note that this delay may be shorter depending on - the available disk usage. (default: 1weeks) - - --hadoop_home=VALUE Where to find Hadoop installed (for - fetching framework executors from HDFS) - (no default, look for HADOOP_HOME in - environment or find hadoop on PATH) (default: ) - - --hostname=VALUE The hostname the slave should report. - If left unset, the hostname is resolved from the IP address that the slave binds to. - - --isolation=VALUE Isolation mechanisms to use, e.g., 'posix/cpu,posix/mem' - or 'cgroups/cpu,cgroups/mem' or 'external'. (default: posix/cpu,posix/mem) - - --launcher_dir=VALUE Location of Mesos binaries (default: /usr/local/libexec/mesos) - - --recover=VALUE Whether to recover status updates and reconnect with old executors. - Valid values for 'recover' are - reconnect: Reconnect with any old live executors. - cleanup : Kill any old live executors and exit. - Use this option when doing an incompatible slave - or executor upgrade!). - NOTE: If checkpointed slave doesn't exist, no recovery is performed - and the slave registers with the master as a new slave. (default: reconnect) - - --recovery_timeout=VALUE Amount of time alloted for the slave to recover. If the slave takes - longer than recovery_timeout to recover, any executors that are - waiting to reconnect to the slave will self-terminate. - NOTE: This flag is only applicable when checkpoint is enabled. - (default: 15mins) - - --registration_backoff_factor=VALUE Slave initially picks a random amount of time between [0, b], where - b = register_backoff_factor, to (re-)register with a new master. - Subsequent retries are exponentially backed off based on this - interval (e.g., 1st retry uses a random value between [0, b * 2^1], - 2nd retry between [0, b * 2^2], 3rd retry between [0, b * 2^3] etc) - up to a maximum of 1mins (default: 1secs) - - --resource_monitoring_interval=VALUE Periodic time interval for monitoring executor - resource usage (e.g., 10secs, 1min, etc) (default: 1secs) - - --resources=VALUE Total consumable resources per slave, in - the form 'name(role):value;name(role):value...'. - - --slave_subsystems=VALUE List of comma-separated cgroup subsystems to run the slave binary - in, e.g., 'memory,cpuacct'. The default is none. - Present functionality is intended for resource monitoring and - no cgroup limits are set, they are inherited from the root mesos - cgroup. - - --[no-]strict If strict=true, any and all recovery errors are considered fatal. - If strict=false, any expected errors (e.g., slave cannot recover - information about an executor, because the slave died right before - the executor registered.) during recovery are ignored and as much - state as possible is recovered. - (default: true) - - --[no-]switch_user Whether to run tasks as the user who - submitted them rather than the user running - the slave (requires setuid permission) (default: true) - - --work_dir=VALUE Where to place framework work directories - (default: /tmp/mesos) -``` +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + <tr> + <td> + --attributes=VALUE + </td> + <td> + Attributes of machine, in the form: + <p/> + <code>rack:2</code> or <code>'rack:2;u:1'</code> + </td> + </tr> + <tr> + <td> + --[no-]cgroups_enable_cfs + </td> + <td> + Cgroups feature flag to enable hard limits on CPU resources + via the CFS bandwidth limiting subfeature. + (default: false) + </td> + </tr> + <tr> + <td> + --cgroups_hierarchy=VALUE + </td> + <td> + The path to the cgroups hierarchy root + (default: /sys/fs/cgroup) + </td> + </tr> + <tr> + <td> + --[no-]cgroups_limit_swap + </td> + <td> + Cgroups feature flag to enable memory limits on both memory and + swap instead of just memory. + (default: false) + </td> + </tr> + <tr> + <td> + --cgroups_root=VALUE + </td> + <td> + Name of the root cgroup + (default: mesos) + </td> + </tr> + <tr> + <td> + --cgroups_subsystems=VALUE + </td> + <td> + This flag has been deprecated and is no longer used, + please update your flags + </td> + </tr> + <tr> + <td> + --[no-]checkpoint + </td> + <td> + This flag is deprecated and will be removed in a future release. + Whether to checkpoint slave and frameworks information + to disk. This enables a restarted slave to recover + status updates and reconnect with (--recover=reconnect) or + kill (--recover=cleanup) old executors (default: true) + </td> + </tr> + <tr> + <td> + --containerizer_path=VALUE + </td> + <td> + The path to the external containerizer executable used when + external isolation is activated (--isolation=external). + + </td> + </tr> + <tr> + <td> + --containerizers=VALUE + </td> + <td> + Comma separated list of containerizer implementations + to compose in order to provide containerization. + <p/> + Available options are 'mesos', 'external', and + 'docker' (on Linux). The order the containerizers + are specified is the order they are tried + (--containerizers=mesos). + (default: mesos) + </td> + </tr> + <tr> + <td> + --credential=VALUE + </td> + <td> + Either a path to a text with a single line + containing 'principal' and 'secret' separated by whitespace. + <p/> + Or a path containing the JSON formatted information used for one credential. + <p/> + Path could be of the form <code>file:///path/to/file< code> or <code>/path/to/file</code>. + <p/> + JSON file example: +<pre><code>{ + "principal": "username", + "secret": "secret" +}</code></pre> + </td> + </tr> + <tr> + <td> + --default_container_image=VALUE + </td> + <td> + The default container image to use if not specified by a task, + when using external containerizer. + + </td> + </tr> + <tr> + <td> + --default_container_info=VALUE + </td> + <td> + JSON formatted ContainerInfo that will be included into + any ExecutorInfo that does not specify a ContainerInfo. + <p/> + See the ContainerInfo protobuf in mesos.proto for + the expected format. + <p/> + Example: +<pre><code>{ + "type": "MESOS", + "volumes": [ + { + "host_path": "./.private/tmp", + "container_path": "/tmp", + "mode": "RW" + } + ] +}</code></pre> + </td> + </tr> + <tr> + <td> + --default_role=VALUE + </td> + <td> + Any resources in the --resources flag that + omit a role, as well as any resources that + are not present in --resources but that are + automatically detected, will be assigned to + this role. (default: *) + </td> + </tr> + <tr> + <td> + --disk_watch_interval=VALUE + </td> + <td> + Periodic time interval (e.g., 10secs, 2mins, etc) + to check the disk usage (default: 1mins) + </td> + </tr> + <tr> + <td> + --docker=VALUE + </td> + <td> + The absolute path to the docker executable for docker + containerizer. + (default: docker) + </td> + </tr> + <tr> + <td> + --docker_remove_delay=VALUE + </td> + <td> + The amount of time to wait before removing docker containers + (e.g., 3days, 2weeks, etc). + (default: 6hrs) + </td> + </tr> + <tr> + <td> + --docker_sandbox_directory=VALUE + </td> + <td> + The absolute path for the directory in the container where the + sandbox is mapped to. + (default: /mnt/mesos/sandbox) + </td> + </tr> + <tr> + <td> + --executor_registration_timeout=VALUE + </td> + <td> + Amount of time to wait for an executor + to register with the slave before considering it hung and + shutting it down (e.g., 60secs, 3mins, etc) (default: 1mins) + </td> + </tr> + <tr> + <td> + --executor_shutdown_grace_period=VALUE + </td> + <td> + Amount of time to wait for an executor + to shut down (e.g., 60secs, 3mins, etc) (default: 5secs) + </td> + </tr> + <tr> + <td> + --frameworks_home=VALUE + </td> + <td> + Directory path prepended to relative executor URIs (default: ) + </td> + </tr> + <tr> + <td> + --gc_delay=VALUE + </td> + <td> + Maximum amount of time to wait before cleaning up + executor directories (e.g., 3days, 2weeks, etc). + <p/> + Note that this delay may be shorter depending on + the available disk usage. (default: 1weeks) + </td> + </tr> + <tr> + <td> + --hadoop_home=VALUE + </td> + <td> + Path to find Hadoop installed (for + fetching framework executors from HDFS) + (no default, look for HADOOP_HOME in + environment or find hadoop on PATH) (default: ) + </td> + </tr> + <tr> + <td> + --hostname=VALUE + </td> + <td> + The hostname the slave should report. + <p/> + If left unset, the hostname is resolved from the IP address + that the slave binds to. + </td> + </tr> + <tr> + <td> + --isolation=VALUE + </td> + <td> + Isolation mechanisms to use, e.g., 'posix/cpu,posix/mem', or + 'cgroups/cpu,cgroups/mem', or network/port_mapping + (configure with flag: --with-network-isolator to enable), + or 'external', or load an alternate isolator module using + the <code>--modules</code> flag. (default: posix/cpu,posix/mem) + </td> + </tr> + <tr> + <td> + --launcher_dir=VALUE + </td> + <td> + Directory path of Mesos binaries (default: /usr/local/lib/mesos) + </td> + </tr> + <tr> + <td> + --modules=VALUE + </td> + <td> + List of modules to be loaded and be available to the internal + subsystems. + <p/> + Use <code>--modules=filepath</code> to specify the list of modules via a + file containing a JSON formatted string. 'filepath' can be + of the form <code>file:///path/to/file</code> or <code>/path/to/file</code>. + <p/> + Use <code>--modules="{...}"</code> to specify the list of modules inline. + <p/> + JSON file example: +<pre><code> +{ + "libraries": [ + { + "file": "/path/to/libfoo.so", + "modules": [ + { + "name": "org_apache_mesos_bar", + "parameters": [ + { + "key": "X", + "value": "Y" + } + ] + }, + { + "name": "org_apache_mesos_baz" + } + ] + }, + { + "name": "qux", + "modules": [ + { + "name": "org_apache_mesos_norf" + } + ] + } + ] +}</code></pre> + </td> + </tr> + <tr> + <td> + --perf_duration=VALUE + </td> + <td> + Duration of a perf stat sample. The duration must be less + that the perf_interval. (default: 10secs) + </td> + </tr> + <tr> + <td> + --perf_events=VALUE + </td> + <td> + List of command-separated perf events to sample for each container + when using the perf_event isolator. Default is none. + <p/> + Run command 'perf list' to see all events. Event names are + sanitized by downcasing and replacing hyphens with underscores + when reported in the PerfStatistics protobuf, e.g., cpu-cycles + becomes cpu_cycles; see the PerfStatistics protobuf for all names. + </td> + </tr> + <tr> + <td> + --perf_interval=VALUE + </td> + <td> + Interval between the start of perf stat samples. Perf samples are + obtained periodically according to perf_interval and the most + recently obtained sample is returned rather than sampling on + demand. For this reason, perf_interval is independent of the + resource monitoring interval (default: 1mins) + </td> + </tr> + <tr> + <td> + --recover=VALUE + </td> + <td> + Whether to recover status updates and reconnect with old executors. + <p/> + Valid values for 'recover' are + <p/> + reconnect: Reconnect with any old live executors. + <p/> + cleanup : Kill any old live executors and exit. + <p/> + Use this option when doing an incompatible slave + or executor upgrade!). + <p/> + NOTE: If checkpointed slave doesn't exist, no recovery is performed + and the slave registers with the master as a new slave. (default: reconnect) + </td> + </tr> + <tr> + <td> + --recovery_timeout=VALUE + </td> + <td> + Amount of time alloted for the slave to recover. If the slave takes + longer than recovery_timeout to recover, any executors that are + waiting to reconnect to the slave will self-terminate. + <p/> + NOTE: This flag is only applicable when checkpoint is enabled. + (default: 15mins) + </td> + </tr> + <tr> + <td> + --registration_backoff_factor=VALUE + </td> + <td> + Slave initially picks a random amount of time between [0, b], where + b = registration_backoff_factor, to (re-)register with a new master. + <p/> + Subsequent retries are exponentially backed off based on this + interval (e.g., 1st retry uses a random value between [0, b * 2^1], + 2nd retry between [0, b * 2^2], 3rd retry between [0, b * 2^3] etc) + up to a maximum of 1mins (default: 1secs) + </td> + </tr> + <tr> + <td> + --resource_monitoring_interval=VALUE + </td> + <td> + Periodic time interval for monitoring executor + resource usage (e.g., 10secs, 1min, etc) (default: 1secs) + </td> + </tr> + <tr> + <td> + --resources=VALUE + </td> + <td> + Total consumable resources per slave, in the form + </p> + <code>name(role):value;name(role):value...</code>. + </td> + </tr> + <tr> + <td> + --slave_subsystems=VALUE + </td> + <td> + List of comma-separated cgroup subsystems to run the slave binary + in, e.g., <code>memory,cpuacct</code>. The default is none. + Present functionality is intended for resource monitoring and + no cgroup limits are set, they are inherited from the root mesos + cgroup. + </td> + </tr> + <tr> + <td> + --[no-]strict + </td> + <td> + If strict=true, any and all recovery errors are considered fatal. + <p/> + If strict=false, any expected errors (e.g., slave cannot recover + information about an executor, because the slave died right before + the executor registered.) during recovery are ignored and as much + state as possible is recovered. + (default: true) + </td> + </tr> + <tr> + <td> + --[no-]switch_user + </td> + <td> + Whether to run tasks as the user who + submitted them rather than the user running + the slave (requires setuid permission) (default: true) + </td> + </tr> + <tr> + <td> + --work_dir=VALUE + </td> + <td> + Directory path to place framework work directories + (default: /tmp/mesos) + </td> + </tr> +</table> ## Mesos Build Configuration Options -The configure script has the following options: +###The configure script has the following flags for optional features: + +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + <tr> + <td> + --enable-shared[=PKGS] + </td> + <td> + build shared libraries [default=yes] + </td> + </tr> + <tr> + <td> + --enable-static[=PKGS] + </td> + <td> + build static libraries [default=yes] + </td> + </tr> + <tr> + <td> + --enable-fast-install[=PKGS] + </td> + <td> + + optimize for fast installation [default=yes] + </td> + </tr> + <tr> + <td> + --disable-libtool-lock + </td> + <td> + avoid locking (might break parallel builds) + </td> + </tr> + <tr> + <td> + --disable-java + </td> + <td> + don't build Java bindings + </td> + </tr> + <tr> + <td> + --disable-python + </td> + <td> + don't build Python bindings + </td> + </tr> + <tr> + <td> + --enable-debug + </td> + <td> + enable debugging. If CFLAGS/CXXFLAGS are set, this + option won't change them default: no + </td> + </tr> + <tr> + <td> + --enable-optimize + </td> + <td> + enable optimizations. If CFLAGS/CXXFLAGS are set, + this option won't change them default: no + </td> + </tr> + <tr> + <td> + --disable-bundled + </td> + <td> + build against preinstalled dependencies instead of + bundled libraries + </td> + </tr> + <tr> + <td> + --disable-bundled-distribute + </td> + <td> + + excludes building and using the bundled distribute + package in lieu of an installed version in + PYTHONPATH + </td> + </tr> + <tr> + <td> + --disable-bundled-pip + </td> + <td> + excludes building and using the bundled pip package + in lieu of an installed version in PYTHONPATH + </td> + </tr> + <tr> + <td> + --disable-bundled-wheel + </td> + <td> + excludes building and using the bundled wheel + package in lieu of an installed version in + PYTHONPATH + </td> + </tr> + <tr> + <td> + --disable-python-dependency-install + </td> + <td> + + when the python packages are installed during make + install, no external dependencies are downloaded or + installed + </td> + </tr> +</table> + +### The configure script has the following flags for optional packages: + +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + <tr> + <td> + --with-gnu-ld + </td> + <td> + assume the C compiler uses GNU ld [default=no] + </td> + </tr> + <tr> + <td> + --with-sysroot=DIR + </td> + <td> + Search for dependent libraries within DIR + (or the compiler's sysroot if not specified). + </td> + </tr> + <tr> + <td> + --with-zookeeper[=DIR] + </td> + <td> + excludes building and using the bundled ZooKeeper + package in lieu of an installed version at a + location prefixed by the given path + </td> + </tr> + <tr> + <td> + --with-leveldb[=DIR] + </td> + <td> + excludes building and using the bundled LevelDB + package in lieu of an installed version at a + location prefixed by the given path + </td> + </tr> + <tr> + <td> + --with-glog[=DIR] + </td> + <td> + excludes building and using the bundled glog package + in lieu of an installed version at a location + prefixed by the given path + </td> + </tr> + <tr> + <td> + --with-protobuf[=DIR] + </td> + <td> + excludes building and using the bundled protobuf + package in lieu of an installed version at a + location prefixed by the given path + </td> + </tr> + <tr> + <td> + --with-gmock[=DIR] + </td> + <td> + excludes building and using the bundled gmock + package in lieu of an installed version at a + location prefixed by the given path + </td> + </tr> + <tr> + <td> + --with-curl=[=DIR] + </td> + <td> + specify where to locate the curl library + </td> + </tr> + <tr> + <td> + --with-sasl=[=DIR] + </td> + <td> + specify where to locate the sasl2 library + </td> + </tr> + <tr> + <td> + --with-zlib=[=DIR] + </td> + <td> + specify where to locate the zlib library + </td> + </tr> + <tr> + <td> + --with-apr=[=DIR] + </td> + <td> + specify where to locate the apr-1 library + </td> + </tr> + <tr> + <td> + --with-svn=[=DIR] + </td> + <td> + specify where to locate the svn-1 library + </td> + </tr> + <tr> + <td> + --with-network-isolator + </td> + <td> + builds the network isolator + </td> + </tr> +</table> -``` -To assign environment variables (e.g., CC, CFLAGS...), specify them as -VAR=VALUE. See below for descriptions of some of the useful variables. - -Defaults for the options are specified in brackets. - -Configuration: - -h, --help display this help and exit - --help=short display options specific to this package - --help=recursive display the short help of all the included packages - -V, --version display version information and exit - -q, --quiet, --silent do not print `checking...' messages - --cache-file=FILE cache test results in FILE [disabled] - -C, --config-cache alias for `--cache-file=config.cache' - -n, --no-create do not create output files - --srcdir=DIR find the sources in DIR [configure dir or `..'] - -Installation directories: - --prefix=PREFIX install architecture-independent files in PREFIX - [/usr/local] - --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX - [PREFIX] - -By default, `make install' will install all the files in -`/usr/local/bin', `/usr/local/lib' etc. You can specify -an installation prefix other than `/usr/local' using `--prefix', -for instance `--prefix=$HOME'. - -For better control, use the options below. - -Fine tuning of the installation directories: - --bindir=DIR user executables [EPREFIX/bin] - --sbindir=DIR system admin executables [EPREFIX/sbin] - --libexecdir=DIR program executables [EPREFIX/libexec] - --sysconfdir=DIR read-only single-machine data [PREFIX/etc] - --sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com] - --localstatedir=DIR modifiable single-machine data [PREFIX/var] - --libdir=DIR object code libraries [EPREFIX/lib] - --includedir=DIR C header files [PREFIX/include] - --oldincludedir=DIR C header files for non-gcc [/usr/include] - --datarootdir=DIR read-only arch.-independent data root [PREFIX/share] - --datadir=DIR read-only architecture-independent data [DATAROOTDIR] - --infodir=DIR info documentation [DATAROOTDIR/info] - --localedir=DIR locale-dependent data [DATAROOTDIR/locale] - --mandir=DIR man documentation [DATAROOTDIR/man] - --docdir=DIR documentation root [DATAROOTDIR/doc/mesos] - --htmldir=DIR html documentation [DOCDIR] - --dvidir=DIR dvi documentation [DOCDIR] - --pdfdir=DIR pdf documentation [DOCDIR] - --psdir=DIR ps documentation [DOCDIR] - -Program names: - --program-prefix=PREFIX prepend PREFIX to installed program names - --program-suffix=SUFFIX append SUFFIX to installed program names - --program-transform-name=PROGRAM run sed PROGRAM on installed program names - -System types: - --build=BUILD configure for building on BUILD [guessed] - --host=HOST cross-compile to build programs to run on HOST [BUILD] - --target=TARGET configure for building compilers for TARGET [HOST] - -Optional Features: - --disable-option-checking ignore unrecognized --enable/--with options - --disable-FEATURE do not include FEATURE (same as --enable-FEATURE=no) - --enable-FEATURE[=ARG] include FEATURE [ARG=yes] - --enable-shared[=PKGS] build shared libraries [default=yes] - --enable-static[=PKGS] build static libraries [default=yes] - --enable-fast-install[=PKGS] - optimize for fast installation [default=yes] - --disable-dependency-tracking speeds up one-time build - --enable-dependency-tracking do not reject slow dependency extractors - --disable-libtool-lock avoid locking (might break parallel builds) - --disable-java don't build Java bindings - --disable-python don't build Python bindings - --disable-optimize don't try to compile with optimizations - --disable-bundled build against preinstalled dependencies instead of - bundled libraries - -Optional Packages: - --with-PACKAGE[=ARG] use PACKAGE [ARG=yes] - --without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no) - --with-pic[=PKGS] try to use only PIC/non-PIC objects [default=use - both] - --with-gnu-ld assume the C compiler uses GNU ld [default=no] - --with-sysroot=DIR Search for dependent libraries within DIR - (or the compiler's sysroot if not specified). - --with-zookeeper[=DIR] excludes building and using the bundled ZooKeeper - package in lieu of an installed version at a - location prefixed by the given path - --with-leveldb[=DIR] excludes building and using the bundled LevelDB - package in lieu of an installed version at a - location prefixed by the given path - --without-cxx11 builds Mesos without C++11 support (deprecated) - --with-network-isolator builds the network isolator - -Some influential environment variables: - CC C compiler command - CFLAGS C compiler flags - LDFLAGS linker flags, e.g. -L<lib dir> if you have libraries in a - nonstandard directory <lib dir> - LIBS libraries to pass to the linker, e.g. -l<library> - CPPFLAGS C/C++/Objective C preprocessor flags, e.g. -I<include dir> if - you have headers in a nonstandard directory <include dir> - CXX C++ compiler command - CXXFLAGS C++ compiler flags - CPP C preprocessor - CXXCPP C++ preprocessor - JAVA_HOME location of Java Development Kit (JDK) - JAVA_CPPFLAGS - preprocessor flags for JNI - JAVA_JVM_LIBRARY - full path to libjvm.so - MAVEN_HOME looks for mvn at MAVEN_HOME/bin/mvn - PYTHON which Python interpreter to use - PYTHON_VERSION - The installed Python version to use, for example '2.3'. This - string will be appended to the Python interpreter canonical - name. +### Some influential environment variables for configure script: Use these variables to override the choices made by `configure' or to help it to find libraries and programs with nonstandard names/locations. -``` + +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + <tr> + <td> + JAVA_HOME + </td> + <td> + location of Java Development Kit (JDK) + </td> + </tr> + <tr> + <td> + JAVA_CPPFLAGS + </td> + <td> + preprocessor flags for JNI + </td> + </tr> + <tr> + <td> + JAVA_JVM_LIBRARY + </td> + <td> + full path to libjvm.so + </td> + </tr> + <tr> + <td> + MAVEN_HOME + </td> + <td> + looks for mvn at MAVEN_HOME/bin/mvn + </td> + </tr> + <tr> + <td> + PROTOBUF_JAR + </td> + <td> + full path to protobuf jar on prefixed builds + </td> + </tr> + <tr> + <td> + PYTHON + </td> + <td> + which Python interpreter to use + </td> + </tr> + <tr> + <td> + PYTHON_VERSION + </td> + <td> + The installed Python version to use, for example '2.3'. This + string will be appended to the Python interpreter canonical + name. + </td> + </tr> +</table> Added: mesos/site/source/documentation/latest/external-containerizer.md URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/external-containerizer.md?rev=1638021&view=auto ============================================================================== --- mesos/site/source/documentation/latest/external-containerizer.md (added) +++ mesos/site/source/documentation/latest/external-containerizer.md Tue Nov 11 04:11:00 2014 @@ -0,0 +1,494 @@ +--- +layout: documentation +--- + +# External Containerizer + + +* EC = external containerizer. A part of the mesos slave that provides +an API for containerizing via external plugin executables. +* ECP = external containerizer program. An external plugin executable +implementing the actual containerizing by interfacing with a +containerizing system (e.g. Docker). + +# Containerizing + + +# General Overview + +EC invokes ECP as a shell process, passing the command as a parameter +to the ECP executable. Additional data is exhanged via stdin and +stdout. + +The ECP is expected to return a zero exit code for all commands it was +able to process. A non-zero status code signals an error. Below you +will find an overview of the commands that have to be implemented by +an ECP, as well as their invocation scheme. + +The ECP is expected to be using stderr for state info and displaying +additional debug information. That information is getting logged to +a file, see [Enviroment: **Sandbox**](#sandbox). + + +### Call and communication scheme + +Interface describing the functions an ECP has to implement via +command calls. Many invocations on the ECP will also pass a +protobuf message along via stdin. Some invocations on the ECP also +expect to deliver a result protobuf message back via stdout. +All protobuf messages are prefixed by their original length - +this is sometimes referred to as "Record-IO"-format. See +[Record-IO De/Serializing Example](#record-io-deserializing-example). + +**COMMAND < INPUT-PROTO > RESULT-PROTO** + +* `launch < containerizer::Launch` +* `update < containerizer::Update` +* `usage < containerizer::Usage > mesos::ResourceStatistics` +* `wait < containerizer::Wait > containerizer::Termination` +* `destroy < containerizer::Destroy` +* `containers > containerizer::Containers` +* `recover` + + +# Command Ordering + +## Make no assumptions +Commands may pretty much come in any order. There is only one +exception to this rule; when launching a task, the EC will make sure +that the ECP first receives a `launch` on that specific container, all +other commands are queued until `launch` returns from the ECP. + + +# Use Cases + +## Task Launching EC / ECP Overview + +* EC invokes `launch` on the ECP. + * Along with that call, the ECP will receive a containerizer::Launch + protobuf message via stdin. + * ECP now makes sure the executor gets started. +**Note** that `launch` is not supposed to block. It should return +immediately after triggering the executor/command - that could be done +via fork-exec within the ECP. +* EC invokes `wait` on the ECP. + * Along with that call, the ECP will receive a containerizer::Wait + protobuf message via stdin. + * ECP now blocks until the launched command is reaped - that could be + implemented via waitpid within the ECP. + * Once the command is reaped, the ECP should deliver a + containerizer::Termination protobuf message via stdout, back to the + EC. + + +## Container Lifecycle Sequence Diagrams + + +### Container Launching + +A container is in a staging state and now gets started and observed +until it gets into a final state. + + + +### Container Running + +A container has gotten launched at some point and now is considered +being in a non terminal state by the slave. The following commands +will get triggered multiple times at the ECP over the lifetime of a +container. Their order however is not determined. + + + +### Resource Limitation + +While a container is active, a resource limitation was identified +(e.g. out of memory) by the ECP isolation mechanism of choice. + + + +## Slave Recovery Overview + +* Slave recovers via check pointed state. +* EC invokes `recover` on the ECP - there is no protobuf message sent +or expected as a result from this command. + * The ECP may try to recover internal states via its own failover +mechanisms, if needed. +* After `recover` returns, the EC will invoke `containers` on the ECP. + * The ECP should return Containers which is a list of currently + active containers. +**Note** these containers are known to the ECP but might in fact +partially be unknown to the slave (e.g. slave failed after launch but +before or within wait) - those containers are considered to be +orphans. +* The EC now compares the list of slave known containers to those +listed within `Containers`. For each orphan it identifies, the slave +will invoke a `wait` followed by a `destroy` on the ECP for those +containers. +* Slave will now call `wait` on the ECP (via EC) for all recovered +containers. This does once again put `wait` into the position of the +ultimate command reaper. + + +## Slave Recovery Sequence Diagram + +### Recovery + +While containers are active, the slave fails over. + + + +### Orphan Destruction + +Containers identified by the ECP as being active but not slave state +recoverable are getting terminated. + + + + +# Command Details + +## launch +### Start the containerized executor + +Hands over all information the ECP needs for launching a task +via an executor. +This call should not wait for the executor/command to return. The +actual reaping of the containerized command is done via the `wait` +call. + + launch < containerizer::Launch + +This call receives the containerizer::Launch protobuf via stdin. + + /** + * Encodes the launch command sent to the external containerizer + * program. + */ + message Launch { + required ContainerID container_id = 1; + optional TaskInfo task_info = 2; + optional ExecutorInfo executor_info = 3; + optional string directory = 4; + optional string user = 5; + optional SlaveID slave_id = 6; + optional string slave_pid = 7; + optional bool checkpoint = 8; + } + +This call does not return any data via stdout. + +## wait +### Gets information on the containerized executor's Termination + +Is expected to reap the executor/command. This call should block +until the executor/command has terminated. + + wait < containerizer::Wait > containerizer::Termination + +This call receives the containerizer::Wait protobuf via stdin. + + /** + * Encodes the wait command sent to the external containerizer + * program. + */ + message Wait { + required ContainerID container_id = 1; + } + +This call is expected to return containerizer::Termination via stdout. + + /** + * Information about a container termination, returned by the + * containerizer to the slave. + */ + message Termination { + // A container may be killed if it exceeds its resources; this will + // be indicated by killed=true and described by the message string. + required bool killed = 1; + required string message = 2; + + // Exit status of the process. + optional int32 status = 3; + } + +The Termination attribute `killed` is to be set only when the +containerizer or the underlying isolation had to enforce a limitation +by killing the task (e.g. task exceeded suggested memory limit). + +## update +### Updates the container's resource limits + +Is sending (new) resource constraints for the given container. +Resource constraints onto a container may vary over the lifetime of +the containerized task. + + update < containerizer::Update + +This call receives the containerizer::Update protobuf via stdin. + + /** + * Encodes the update command sent to the external containerizer + * program. + */ + message Update { + required ContainerID container_id = 1; + repeated Resource resources = 2; + } + +This call does not return any data via stdout. + +## usage +### Gathers resource usage statistics for a containerized task +Is used for polling the current resource uses for the given container. + + usage < containerizer::Usage > mesos::ResourceStatistics + +This call received the containerizer::Usage protobuf via stdin. + + /** + * Encodes the usage command sent to the external containerizer + * program. + */ + message Usage { + required ContainerID container_id = 1; + } + +This call is expected to return mesos::ResourceStatistics via stdout. + + /* + * A snapshot of resource usage statistics. + */ + message ResourceStatistics { + required double timestamp = 1; // Snapshot time, in seconds since the Epoch. + + // CPU Usage Information: + // Total CPU time spent in user mode, and kernel mode. + optional double cpus_user_time_secs = 2; + optional double cpus_system_time_secs = 3; + + // Number of CPUs allocated. + optional double cpus_limit = 4; + + // cpu.stat on process throttling (for contention issues). + optional uint32 cpus_nr_periods = 7; + optional uint32 cpus_nr_throttled = 8; + optional double cpus_throttled_time_secs = 9; + + // Memory Usage Information: + optional uint64 mem_rss_bytes = 5; // Resident Set Size. + + // Amount of memory resources allocated. + optional uint64 mem_limit_bytes = 6; + + // Broken out memory usage information (files, anonymous, and mmaped files) + optional uint64 mem_file_bytes = 10; + optional uint64 mem_anon_bytes = 11; + optional uint64 mem_mapped_file_bytes = 12; + } + +## destroy +### Terminates the containerized executor + +Is used in rare situations, like for graceful slave shutdown +but also in slave fail over scenarios - see Slave Recovery for more. + + destroy < containerizer::Destroy + +This call receives the containerizer::Destroy protobuf via stdin. + + /** + * Encodes the destroy command sent to the external containerizer + * program. + */ + message Destroy { + required ContainerID container_id = 1; + } + +This call does not return any data via stdout. + +## containers +### Gets all active container-id's + +Returns all container identifiers known to be currently active. + + containers > containerizer::Containers + +This call does not receive any additional data via stdin. + +This call is expected to pass containerizer::Containers back via +stdout. + + /** + * Information on all active containers returned by the containerizer + * to the slave. + */ + message Containers { + repeated ContainerID containers = 1; + } + + +## recover +### Internal ECP state recovery + +Allows the ECP to do a state recovery on its own. If the ECP +uses state check-pointing e.g. via file system, then this call would +be a good moment to de-serialize that state information. Make sure you +also see [Slave Recovery Overview](#slave-recovery-overview) for more. + + recover + +This call does not receive any additional data via stdin. +No returned data via stdout. + + + +### Protobuf Message Definitions + +For possibly more up-to-date versions of the above mentioned protobufs +as well as protobuf messages referenced by them, please check: + +* containerizer::XXX are defined within + include/mesos/containerizer/containerizer.proto. + +* mesos::XXX are defined within include/mesos/mesos.proto. + + + +# Environment + +## **Sandbox** + +A sandbox environment is formed by `cd` into the work-directory of the +executor as well as a stderr redirect into the executor's "stderr" +log-file. +**Note** not **all** invocations have a complete sandbox environment. + + +## Addional Environment Variables + +Additionally, there are a few new environment variables set when +invoking the ECP. + + +* MESOS_LIBEXEC_DIRECTORY = path to mesos-executor, mesos-usage, ... +This information is always present. + +* MESOS_WORK_DIRECTORY = slave work directory. This should be used for +distinguishing slave instances. +This information is always present. + +**Note** that this is specifically helpful for being able to tie a set +of containers to a specific slave instance, thus allowing proper +recovery when needed. + +* MESOS_DEFAULT_CONTAINER_IMAGE = default image as provided via slave +flags (default_container_image). This variable is provided only in +calls to `launch`. + + + +# Debugging + +## Enhanced Verbosity Logging + +For receiving an increased level of status information from the EC +use the GLOG verbosity level. Prefix your mesos startup call by +setting the level to a value higher than or equal to two. + +`GLOG_v=2 ./bin/mesos-slave --master=[...]` + + +## ECP stderr Logging + +All output to stderr of your ECP will get logged to the executor's +'stderr' log file. +The specific location can be extracted from the [Enhanced Verbosity +Logging](#enhanced-verbosity-logging) of the EC. + +Example Log Output: + + I0603 02:12:34.165662 174215168 external_containerizer.cpp:1083] Invoking external containerizer for method 'launch' + I0603 02:12:34.165675 174215168 external_containerizer.cpp:1100] calling: [/Users/till/Development/mesos-till/build/src/test-containerizer launch] + I0603 02:12:34.165678 175824896 slave.cpp:497] Successfully attached file '/tmp/ExternalContainerizerTest_Launch_lP22ci/slaves/20140603-021232-16777343-51377-7591-0/frameworks/20140603-021232-16777343-51377-7591-0000/executors/1/runs/558e0a69-70da-4d71-b4c4-c2820b1d6345' + I0603 02:12:34.165686 174215168 external_containerizer.cpp:1101] directory: /tmp/ExternalContainerizerTest_Launch_lP22ci/slaves/20140603-021232-16777343-51377-7591-0/frameworks/20140603-021232-16777343-51377-7591-0000/executors/1/runs/558e0a69-70da-4d71-b4c4-c2820b1d6345 + +The stderr output of the ECP for this call is found within the stderr file located in the directory displayed in the last quoted line. + + cat /tmp/ExternalContainerizerTest_Launch_lP22ci/slaves/20140603-021232-16777343-51377-7591-0/frameworks/20140603-021232-16777343-51377-7591-0000/executors/1/runs/558e0a69-70da-4d71-b4c4-c2820b1d6345/stderr + + +# Appendix + +## Record-IO Proto Example: Launch + +This is what a properly record-io formatted protobuf looks like. + +**name: offset** + +* length: 00 - 03 = record length in byte + +* payload: 04 - (length + 4) = protobuf payload + +Example length: 00000240h = 576 byte total protobuf size + +Example Hexdump: + + 00000000: 4002 0000 0a26 0a24 3433 3532 3533 6162 2d64 3234 362d 3437 :@....&.$435253ab-d246-47 + 00000018: 6265 2d61 3335 302d 3335 3432 3034 3635 6438 3638 1a81 020a :be-a350-35420465d868.... + 00000030: 030a 0131 2a16 0a04 6370 7573 1000 1a09 0900 0000 0000 0000 :...1*...cpus............ + 00000048: 4032 012a 2a15 0a03 6d65 6d10 001a 0909 0000 0000 0000 9040 :@2.**...mem............@ + 00000060: 3201 2a2a 160a 0464 6973 6b10 001a 0909 0000 0000 0000 9040 :2.**...disk............@ + 00000078: 3201 2a2a 180a 0570 6f72 7473 1001 220a 0a08 0898 f201 1080 :2.**...ports.."......... + 00000090: fa01 3201 2a3a 2a1a 2865 6368 6f20 274e 6f20 7375 6368 2066 :..2.*:*.(echo 'No such f + 000000a8: 696c 6520 6f72 2064 6972 6563 746f 7279 273b 2065 7869 7420 :ile or directory'; exit + 000000c0: 3142 2b0a 2932 3031 3430 3532 362d 3031 3530 3036 2d31 3637 :1B+.)20140526-015006-167 + 000000d8: 3737 3334 332d 3535 3430 332d 3632 3536 372d 3030 3030 4a3d :77343-55403-62567-0000J= + 000000f0: 436f 6d6d 616e 6420 4578 6563 7574 6f72 2028 5461 736b 3a20 :Command Executor (Task: + 00000108: 3129 2028 436f 6d6d 616e 643a 2073 6820 2d63 2027 7768 696c :1) (Command: sh -c 'whil + 00000120: 6520 7472 7565 203b 2e2e 2e27 2952 0131 22c5 012f 746d 702f :e true ;...')R.1"../tmp/ + 00000138: 4578 7465 726e 616c 436f 6e74 6169 6e65 7269 7a65 7254 6573 :ExternalContainerizerTes + 00000150: 745f 4c61 756e 6368 5f6c 5855 6839 662f 736c 6176 6573 2f32 :t_Launch_lXUh9f/slaves/2 + 00000168: 3031 3430 3532 362d 3031 3530 3036 2d31 3637 3737 3334 332d :0140526-015006-16777343- + 00000180: 3535 3430 332d 3632 3536 372d 302f 6672 616d 6577 6f72 6b73 :55403-62567-0/frameworks + 00000198: 2f32 3031 3430 3532 362d 3031 3530 3036 2d31 3637 3737 3334 :/20140526-015006-1677734 + 000001b0: 332d 3535 3430 332d 3632 3536 372d 3030 3030 2f65 7865 6375 :3-55403-62567-0000/execu + 000001c8: 746f 7273 2f31 2f72 756e 732f 3433 3532 3533 6162 2d64 3234 :tors/1/runs/435253ab-d24 + 000001e0: 362d 3437 6265 2d61 3335 302d 3335 3432 3034 3635 6438 3638 :6-47be-a350-35420465d868 + 000001f8: 2a04 7469 6c6c 3228 0a26 3230 3134 3035 3236 2d30 3135 3030 :*.till2(.&20140526-01500 + 00000210: 362d 3136 3737 3733 3433 2d35 3534 3033 2d36 3235 3637 2d30 :6-16777343-55403-62567-0 + 00000228: 3a18 736c 6176 6528 3129 4031 3237 2e30 2e30 2e31 3a35 3534 ::.slave(1)@127.0.0.1:554 + 00000240: 3033 4000 + +## Record-IO De/Serializing Example +How to send and receive such record-io formatted message +using Python + +*taken from src/examples/python/test_containerizer.py* + + # Read a data chunk prefixed by its total size from stdin. + def receive(): + # Read size (uint32 => 4 bytes). + size = struct.unpack('I', sys.stdin.read(4)) + if size[0] <= 0: + print >> sys.stderr, "Expected protobuf size over stdin. " \ + "Received 0 bytes." + return "" + + # Read payload. + data = sys.stdin.read(size[0]) + if len(data) != size[0]: + print >> sys.stderr, "Expected %d bytes protobuf over stdin. " \ + "Received %d bytes." % (size[0], len(data)) + return "" + + return data + + # Write a protobuf message prefixed by its total size (aka recordio) + # to stdout. + def send(data): + # Write size (uint32 => 4 bytes). + sys.stdout.write(struct.pack('I', len(data))) + + # Write payload. + sys.stdout.write(data) \ No newline at end of file Added: mesos/site/source/documentation/latest/mesos-containerizer.md URL: http://svn.apache.org/viewvc/mesos/site/source/documentation/latest/mesos-containerizer.md?rev=1638021&view=auto ============================================================================== --- mesos/site/source/documentation/latest/mesos-containerizer.md (added) +++ mesos/site/source/documentation/latest/mesos-containerizer.md Tue Nov 11 04:11:00 2014 @@ -0,0 +1,60 @@ +--- +layout: documentation +--- + +# Mesos Containerizer + +The MesosContainerizer provides lightweight containerization and +resource isolation of executors using Linux-specific functionality +such as control cgroups and namespaces. It is composable so operators +can selectively enable different isolators. + +It also provides basic support for POSIX systems (e.g., OSX) but +without any actual isolation, only resource usage reporting. + +### Shared Filesystem + +The SharedFilesystem isolator can optionally be used on Linux hosts to +enable modifications to each container's view of the shared +filesystem. + +The modifications are specified in the ContainerInfo included in the +ExecutorInfo, either by a framework or by using the +--default\_container\_info slave flag. + +ContainerInfo specifies Volumes which map parts of the shared +filesystem (host\_path) into the container's view of the filesystem +(container\_path), as read-write or read-only. The host\_path can be +absolute, in which case it will make the filesystem subtree rooted at +host\_path also accessible under container\_path for each container. +If host\_path is relative then it is considered as a directory +relative to the executor's work directory. The directory will be +created and permissions copied from the corresponding directory (which +must exist) in the shared filesystem. + +The primary use-case for this isolator is to selectively make parts of +the shared filesystem private to each container. For example, a +private "/tmp" directory can be achieved with host\_path="tmp" and +container\_path="/tmp" which will create a directory "tmp" inside the +executor's work directory (mode 1777) and simultaneously mount it as +/tmp inside the container. This is transparent to processes running +inside the container. Containers will not be able to see the host's +/tmp or any other container's /tmp. + +### Pid Namespace + +The Pid Namespace isolator can be used to isolate each container in +a separate pid namespace with two main benefits: +1. Visibility: Processes running in the container (executor and + descendants) are unable to see or signal processes outside the + namespace. +2. Clean termination: Termination of the leading process in a pid + namespace will result in the kernel terminating all other processes + in the namespace. + +The Launcher will use (2) during destruction of a container in +preference to the freezer cgroup, avoiding known kernel issues related +to freezing cgroups under OOM conditions. + +/proc will be mounted for containers so tools such as 'ps' will work +correctly.
