Repository: aurora Updated Branches: refs/heads/master b429612ef -> c4903d873
Introduce a flag to treat RAM as a revocable resources We plan to open source a very simple Mesos ResourceEstimator and QosController that supports RAM and CPU oversubscription (ETA ~2 weeks). We have been using it internally with a patched Aurora version where the hardcoded `isMesosRevocable` flag of RAM has been set to `true`. This patch makes this behaviour configurable. Reviewed at https://reviews.apache.org/r/51807/ Project: http://git-wip-us.apache.org/repos/asf/aurora/repo Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/c4903d87 Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/c4903d87 Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/c4903d87 Branch: refs/heads/master Commit: c4903d873d090549ebdf9a07110851b5aad7d978 Parents: b429612 Author: Stephan Erb <[email protected]> Authored: Tue Sep 13 00:09:29 2016 +0200 Committer: Stephan Erb <[email protected]> Committed: Tue Sep 13 00:09:29 2016 +0200 ---------------------------------------------------------------------- RELEASE-NOTES.md | 2 ++ docs/features/resource-isolation.md | 6 ++-- docs/operations/configuration.md | 5 +++ docs/reference/scheduler-configuration.md | 24 ++++++++++--- .../scheduler/resources/ResourceSettings.java | 37 ++++++++++++++++++++ .../scheduler/resources/ResourceType.java | 6 ++-- 6 files changed, 70 insertions(+), 10 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/RELEASE-NOTES.md ---------------------------------------------------------------------- diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md index bbf7198..4476d52 100644 --- a/RELEASE-NOTES.md +++ b/RELEASE-NOTES.md @@ -35,6 +35,8 @@ schedulers up. A rolling upgrade would result in no leading scheduler for the duration of the roll which could be confusing to monitor and debug. - Add a new MTTS (Median Time To Starting) metric in addition to MTTA and MTTR. +- In addition to CPU resources, RAM resources can now be treated as revocable via the scheduler + commandline flag `-enable_revocable_ram`. ### Deprecations and removals: http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/docs/features/resource-isolation.md ---------------------------------------------------------------------- diff --git a/docs/features/resource-isolation.md b/docs/features/resource-isolation.md index 01c5b40..503f2de 100644 --- a/docs/features/resource-isolation.md +++ b/docs/features/resource-isolation.md @@ -168,9 +168,9 @@ via the concept of revocable tasks. In contrast to non-revocable tasks, revocabl Mesos reserves the right to throttle or even kill them if they might affect existing high-priority user-facing services. -As of today, the only revocable resource supported by Aurora are CPU resources. A job can opt-in to -use those by specifying the `revocable` [Configuration Tier](../features/multitenancy.md#configuration-tiers). -A revocable job will only be scheduled using revocable CPU resources, even if there are plenty of +As of today, the only revocable resource supported by Aurora are CPU and RAM resources. A job can +opt-in to use those by specifying the `revocable` [Configuration Tier](../features/multitenancy.md#configuration-tiers). +A revocable job will only be scheduled using revocable resources, even if there are plenty of non-revocable resources available. The Aurora scheduler must be [configured to receive revocable offers](../operations/configuration.md#resource-isolation) http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/docs/operations/configuration.md ---------------------------------------------------------------------- diff --git a/docs/operations/configuration.md b/docs/operations/configuration.md index 90dde57..203f3be 100644 --- a/docs/operations/configuration.md +++ b/docs/operations/configuration.md @@ -126,6 +126,11 @@ and then set set this Aurora scheduler flag to allow receiving revocable Mesos o -receive_revocable_resources=true +Both CPUs and RAM are supported as revocable resources. The former is enabled by the default, +the latter needs to be enabled via: + + -enable_revocable_ram=true + Unless you want to use the [default](../../src/main/resources/org/apache/aurora/scheduler/tiers.json) tier configuration, you will also have to specify a file path: http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/docs/reference/scheduler-configuration.md ---------------------------------------------------------------------- diff --git a/docs/reference/scheduler-configuration.md b/docs/reference/scheduler-configuration.md index 87d2cde..31be714 100644 --- a/docs/reference/scheduler-configuration.md +++ b/docs/reference/scheduler-configuration.md @@ -22,6 +22,8 @@ Required flags: Max number of idle connections to the database via MyBatis -framework_authentication_file Properties file which contains framework credentials to authenticate with Mesosmaster. Must contain the properties 'aurora_authentication_principal' and 'aurora_authentication_secret'. +-ip + The ip address to listen. If not set, the scheduler will listen on all interfaces. -mesos_master_address [not null] Address for the mesos master, can be a socket address or zookeeper path. -mesos_role @@ -34,12 +36,16 @@ Required flags: Path to the thermos executor entry point. -tier_config [file must be readable] Configuration file defining supported task tiers, task traits and behaviors. +-webhook_config [file must exist, file must be readable] + Path to webhook configuration file. -zk_endpoints [must have at least 1 item] Endpoint specification for the ZooKeeper servers. Optional flags: -allow_docker_parameters (default false) Allow to pass docker container parameters in the job. +-allow_gpu_resource (default false) + Allow jobs to request Mesos GPU resource. -allowed_container_types (default [MESOS]) Container types that are allowed to be used by jobs. -async_slot_stat_update_interval (default (1, mins)) @@ -76,10 +82,16 @@ Optional flags: List of domains for which CORS support should be enabled. -enable_h2_console (default false) Enable H2 DB management console. +-enable_mesos_fetcher (default false) + Allow jobs to pass URIs to the Mesos Fetcher. Note that enabling this feature could pose a privilege escalation threat. -enable_preemptor (default true) Enable the preemptor and preemption +-enable_revocable_cpus (default true) + Treat CPUs as a revocable resource. +-enable_revocable_ram (default false) + Treat RAM as a revocable resource. -executor_user (default root) - User to start the executor. Defaults to "root". Set this to an unprivileged user if the mesos master was started with "--no-root_submissions". If set to anything other than "root", the executor will ignore the "role" setting for jobs since it can't use setuid() anymore. This means that all your jobs will run under the specified user and the user has to exist on the mesos slaves. + User to start the executor. Defaults to "root". Set this to an unprivileged user if the mesos master was started with "--no-root_submissions". If set to anything other than "root", the executor will ignore the "role" setting for jobs since it can't use setuid() anymore. This means that all your jobs will run under the specified user and the user has to exist on the Mesos agents. -first_schedule_delay (default (1, ms)) Initial amount of time to wait before first attempting to schedule a PENDING task. -flapping_task_threshold (default (5, mins)) @@ -163,7 +175,7 @@ Optional flags: -offer_hold_jitter_window (default (1, mins)) Maximum amount of random jitter to add to the offer hold time window. -offer_reservation_duration (default (3, mins)) - Time to reserve a slave's offers while trying to satisfy a task preempting another. + Time to reserve a agent's offers while trying to satisfy a task preempting another. -populate_discovery_info (default false) If true, Aurora populates DiscoveryInfo field of Mesos TaskInfo. -preemption_delay (default (3, mins)) @@ -174,6 +186,10 @@ Optional flags: Time interval between pending task preemption slot searches. -receive_revocable_resources (default false) Allows receiving revocable resource offers from Mesos. +-reconciliation_explicit_batch_interval (default (5, secs)) + Interval between explicit batch reconciliation requests. +-reconciliation_explicit_batch_size (default 1000) [must be > 0] + Number of tasks in a single batch request sent to Mesos for explicit reconciliation. -reconciliation_explicit_interval (default (60, mins)) Interval on which scheduler will ask Mesos for status updates of all non-terminal tasks known to scheduler. -reconciliation_implicit_interval (default (60, mins)) @@ -186,7 +202,7 @@ Optional flags: If false, Docker tasks may run without an executor (EXPERIMENTAL) -shiro_ini_path Path to shiro.ini for authentication and authorization configuration. --shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@13c9d689]) +-shiro_realm_modules (default [org.apache.aurora.scheduler.app.MoreModules$1@158a8276]) Guice modules for configuring Shiro Realms. -sla_non_prod_metrics (default []) Metric categories collected for non production tasks. @@ -218,8 +234,6 @@ Optional flags: Whether to use the experimental database-backed task store. -viz_job_url_prefix (default ) URL prefix for job container stats. --webhook_config [file must be readable] - File to configure a HTTP webhook to receive task state change events. -zk_chroot_path chroot path to use for the ZooKeeper connections -zk_digest_credentials http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java b/src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java new file mode 100644 index 0000000..c49fd06 --- /dev/null +++ b/src/main/java/org/apache/aurora/scheduler/resources/ResourceSettings.java @@ -0,0 +1,37 @@ +/** + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.aurora.scheduler.resources; + +import org.apache.aurora.common.args.Arg; +import org.apache.aurora.common.args.CmdLine; + +/** + * Control knobs for how Aurora treats different resource types. + * + * The command line handling seen here is non-standard. Normally we declare them in modules + * and then inject them via 'settings' classes. Unfortunately, this does not work here as we + * would need to perform the injection into the ResourceType enum. Enums are picky in that regard. + */ +final class ResourceSettings { + + @CmdLine(name = "enable_revocable_cpus", help = "Treat CPUs as a revocable resource.") + static final Arg<Boolean> ENABLE_REVOCABLE_CPUS = Arg.create(true); + + @CmdLine(name = "enable_revocable_ram", help = "Treat RAM as a revocable resource.") + static final Arg<Boolean> ENABLE_REVOCABLE_RAM = Arg.create(false); + + private ResourceSettings() { + + } +} http://git-wip-us.apache.org/repos/asf/aurora/blob/c4903d87/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java b/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java index 4c102a3..e1a5dce 100644 --- a/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java +++ b/src/main/java/org/apache/aurora/scheduler/resources/ResourceType.java @@ -36,6 +36,8 @@ import static org.apache.aurora.scheduler.resources.AuroraResourceConverter.STRI import static org.apache.aurora.scheduler.resources.MesosResourceConverter.RANGES; import static org.apache.aurora.scheduler.resources.MesosResourceConverter.SCALAR; import static org.apache.aurora.scheduler.resources.ResourceMapper.PORT_MAPPER; +import static org.apache.aurora.scheduler.resources.ResourceSettings.ENABLE_REVOCABLE_CPUS; +import static org.apache.aurora.scheduler.resources.ResourceSettings.ENABLE_REVOCABLE_RAM; /** * Describes Mesos resource types and their Aurora traits. @@ -55,7 +57,7 @@ public enum ResourceType implements TEnum { "core(s)", 16, false, - true), + ENABLE_REVOCABLE_CPUS.get()), /** * RAM resource. @@ -70,7 +72,7 @@ public enum ResourceType implements TEnum { "MB", Amount.of(24, GB).as(MB), false, - false), + ENABLE_REVOCABLE_RAM.get()), /** * DISK resource.
