Repository: mesos Updated Branches: refs/heads/master 6b842c27b -> eba5a7339
Added oversubscription user doc. Review: https://reviews.apache.org/r/36488 Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/eba5a733 Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/eba5a733 Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/eba5a733 Branch: refs/heads/master Commit: eba5a73393769a4980b65782d95b6af0cf93f7c0 Parents: 6b842c2 Author: Niklas Nielsen <[email protected]> Authored: Fri Jul 17 15:30:35 2015 -0700 Committer: Niklas Q. Nielsen <[email protected]> Committed: Fri Jul 17 15:30:35 2015 -0700 ---------------------------------------------------------------------- docs/images/oversubscription-overview.jpg | Bin 0 -> 67771 bytes docs/oversubscription.md | 305 +++++++++++++++++++++++++ 2 files changed, 305 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/eba5a733/docs/images/oversubscription-overview.jpg ---------------------------------------------------------------------- diff --git a/docs/images/oversubscription-overview.jpg b/docs/images/oversubscription-overview.jpg new file mode 100644 index 0000000..2d31097 Binary files /dev/null and b/docs/images/oversubscription-overview.jpg differ http://git-wip-us.apache.org/repos/asf/mesos/blob/eba5a733/docs/oversubscription.md ---------------------------------------------------------------------- diff --git a/docs/oversubscription.md b/docs/oversubscription.md new file mode 100644 index 0000000..f17d4d4 --- /dev/null +++ b/docs/oversubscription.md @@ -0,0 +1,305 @@ +--- layout: documentation --- + +# Oversubscription + +High-priority user-facing services are typically provisioned on large clusters +for peak load and unexpected load spikes. Hence, for most of time, the +provisioned resources remain underutilized. Oversubscription takes advantage of +temporarily unused resources to execute best-effort tasks such as background +analytics, video/image processing, chip simulations, and other low priority +jobs. + +## How does it work? + +Oversubscription was introduced in Mesos 0.23.0 and adds two new slave +components: a Resource Estimator and a Quality of Service (QoS) Controller, +alongside extending the existing resource allocator, resource monitor, and +mesos slave. The new components and their interactions are illustrated below. + + + +### Resource estimation + + - (1) The first step is to identify the amount of oversubscribed resources. + The resource estimator taps into the resource monitor and periodically gets +usage statistics via `ResourceStatistic` messages. The resource estimator +applies logic based on the collected resource statistics to determine the +amount of oversubscribed resources. This can be a series of control algorithms +based on measured resource usage slack (allocated but unused resources) and +allocation slack. + + - (2) The slave keeps polling estimates from the resource estimator and tracks + the latest estimate. + + - (3) The slave will send the total amount of oversubscribed resources to the + master when the latest estimate is different from the previous estimate. + +### Resource tracking & scheduling algorithm + + - (4) The allocator keeps track of the oversubscribed resources separately + from regular resources and annotate those resources as `revocable`. It is up +to the resource estimator to determine which types of resources can be +oversubscribed. It is recommended only to oversubscribe _compressible_ +resources such as cpu shares, bandwidth, etc. + +### Frameworks + + - (5) Frameworks can choose to launch tasks on revocable resources by using + the regular launchTasks() API. To safe-guard frameworks that are not +designed to deal with preemption, only frameworks registering with the +`REVOCABLE_RESOURCES` capability set in its framework info will receive offers +with revocable resources. Further more, recovable resources cannot be +dynamically reserved and persistent volumes should not be created on revocable +disk resources. + +### Task launch + + - The revocable task is launched as usual when the runTask request is received + on the slave. The resources will still be marked as revocable and isolators +can take appropriate actions, if certain resources need to be setup differently +for revocable and regular tasks. + +> NOTE: If any resource used by a task or executor is +revocable, the whole container is treated as a revocable container and can +therefore be killed or throttled by the QoS Controller. + +### Interference detection + + - (6) When the revocable task is running, it is important to constantly + monitor the original task running on those resources and guarantee +performance based on an SLA. In order to react to detected interference, the +QoS controller needs to be able to kill or throttle running revocable tasks. + +## Enabling frameworks to use oversubscribed resources + +Frameworks planning to use oversubscribed resources need to register with the +`REVOCABLE_RESOURCES` capability set: + +~~~{.cpp} +FrameworkInfo framework; +framework.set_name("Revocable framework"); + +framework.add_capabilities()->set_type( + FrameworkInfo::Capability::REVOCABLE_RESOURCES); +~~~ + +From that point on, the framework will start to receive revocable resources in +offers. + +> NOTE: That there is no guarantee that the Mesos cluster has oversubscription +enabled. If not, no revocable resources will be offered. See below for +instructions how to configure Mesos for oversubscription. + +### Launching tasks using revocable resources + +Launching tasks using recovable resources is done through the existing +`launchTasks` API. Revocable resources will have the `recovable` field set. See +below for an example offer with regular and revocable resources. + +~~~{.json} +{ + "id": "20150618-112946-201330860-5050-2210-0000", + "framework_id": "20141119-101031-201330860-5050-3757-0000", + "slave_id": "20150618-112946-201330860-5050-2210-S1", + "hostname": "foobar", + "resources": [ + { + "name": "cpus", + "type": "SCALAR", + "scalar": { + "value": 2.0 + }, + "role": "*" + }, { + "name": "mem", + "type": "SCALAR", + "scalar": { + "value": 512.0 + }, + "role": "*" + }, + { + "name": "cpus", + "type": "SCALAR", + "scalar": { + "value": 0.45 + }, + "role": "*", + "revocable": {} + } + ] +} +~~~ + +## Writing a custom resource estimator + +The resource estimator estimates and predicts the total resources used on the +slave and informs the master about resources that can be oversubscribed. By +default, Mesos comes with a `noop` and a `fixed` resource estimator. The `noop` +estimator only provides an empty estimate to the slave and stalls, effectively +disabling oversubscription. The `fixed` estimator doesn't use the actual +measured slack, but oversubscribes the node with fixed resource amount (defined +via a command line flag). + +The interface is defined below: + +~~~{.cpp} +class ResourceEstimator +{ +public: + // Initializes this resource estimator. This method needs to be + // called before any other member method is called. It registers + // a callback in the resource estimator. The callback allows the + // resource estimator to fetch the current resource usage for each + // executor on slave. + virtual Try<Nothing> initialize( + const lambda::function<process::Future<ResourceUsage>()>& usage) = 0; + + // Returns the current estimation about the *maximum* amount of + // resources that can be oversubscribed on the slave. A new + // estimation will invalidate all the previously returned + // estimations. The slave will be calling this method periodically + // to forward it to the master. As a result, the estimator should + // respond with an estimate every time this method is called. + virtual process::Future<Resources> oversubscribable() = 0; +}; +~~~ + +## Writing a custom QoS controller + +The interface for implementing custom QoS Controllers is defined below: + +~~~{.cpp} +class QoSController +{ +public: + // Initializes this QoS Controller. This method needs to be + // called before any other member method is called. It registers + // a callback in the QoS Controller. The callback allows the + // QoS Controller to fetch the current resource usage for each + // executor on slave. + virtual Try<Nothing> initialize( + const lambda::function<process::Future<ResourceUsage>()>& usage) = 0; + + // A QoS Controller informs the slave about corrections to carry + // out, but returning futures to QoSCorrection objects. For more + // information, please refer to mesos.proto. + virtual process::Future<std::list<QoSCorrection>> corrections() = 0; +}; +~~~ + +> NOTE The QoS Controller must not block `corrections()`. Back the QoS +> Controller with it's own libprocess actor instead. + +The QoS Controller informs the slave that particular corrective actions need to +be made. Each corrective action contains information about executor or task and +the type of action to perform. + +~~~{.proto} +message QoSCorrection { + enum Type { + KILL = 1; // Terminate an executor. + } + + message Kill { + optional FrameworkID framework_id = 1; + optional ExecutorID executor_id = 2; + } + + required Type type = 1; + optional Kill kill = 2; +} +~~~ + +## Configuring Mesos for oversubscription + +Five new flags has been added to the slave: + +<table class="table table-striped"> + <thead> + <tr> + <th width="30%"> + Flag + </th> + <th> + Explanation + </th> + </thead> + + <tr> + <td> + --oversubscribed_resources_interval=VALUE + </td> + <td> + The slave periodically updates the master with the current estimation +about the total amount of oversubscribed resources that are allocated and +available. The interval between updates is controlled by this flag. (default: +15secs) + </td> + </tr> + + <tr> + <td> + --qos_controller=VALUE + </td> + <td> + The name of the QoS Controller to use for oversubscription. + </td> + </tr> + + <tr> + <td> + --qos_correction_interval_min=VALUE + </td> + <td> + The slave polls and carries out QoS corrections from the QoS Controller +based on its observed performance of running tasks. The smallest interval +between these corrections is controlled by this flag. (default: 0ns) + </td> + </tr> + + <tr> + <td> + --resource_estimator=VALUE + </td> + <td> + The name of the resource estimator to use for oversubscription. + </td> + </tr> + + <tr> + <td> + --resource_monitoring_interval=VALUE + </td> + <td> + Periodic time interval for monitoring executor resource usage (e.g., +10secs, 1min, etc) (default: 1secs) + </td> + </tr> + +</table> + +The `fixed` resource estimator is enabled as follows: + +``` +--resource_estimator="org_apache_mesos_FixedResourceEstimator" + +--modules='{ + "libraries": { + "file": "/usr/local/lib64/libfixed_resource_estimator.so", + "modules": { + "name": "org_apache_mesos_FixedResourceEstimator", + "parameters": { + "key": "resources", + "value": "cpus:14" + } + } + } +}' +``` + +In the example above, a fixed amount of 14 cpus will be offered as revocable +resources. + +To select custom a resource estimator and QoS controller, please refer to the +[modules documentation](modules.md).
