Added: aurora/site/source/documentation/latest/development/scheduler.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/scheduler.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/development/scheduler.md (added) +++ aurora/site/source/documentation/latest/development/scheduler.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,118 @@ +Developing the Aurora Scheduler +=============================== + +The Aurora scheduler is written in Java code and built with [Gradle](http://gradle.org). + + +Prerequisite +============ + +When using Apache Aurora checked out from the source repository or the binary +distribution, the Gradle wrapper and JavaScript dependencies are provided. +However, you need to manually install them when using the source release +downloads: + +1. Install Gradle following the instructions on the [Gradle web site](http://gradle.org) +2. From the root directory of the Apache Aurora project generate the Gradle +wrapper by running: + + gradle wrapper + + +Getting Started +=============== + +You will need Java 8 installed and on your `PATH` or unzipped somewhere with `JAVA_HOME` set. Then + + ./gradlew tasks + +will bootstrap the build system and show available tasks. This can take a while the first time you +run it but subsequent runs will be much faster due to cached artifacts. + +Running the Tests +----------------- +Aurora has a comprehensive unit test suite. To run the tests use + + ./gradlew build + +Gradle will only re-run tests when dependencies of them have changed. To force a re-run of all +tests use + + ./gradlew clean build + +Running the build with code quality checks +------------------------------------------ +To speed up development iteration, the plain gradle commands will not run static analysis tools. +However, you should run these before posting a review diff, and **always** run this before pushing a +commit to origin/master. + + ./gradlew build -Pq + +Running integration tests +------------------------- +To run the same tests that are run in the Apache Aurora continuous integration +environment: + + ./build-support/jenkins/build.sh + +In addition, there is an end-to-end test that runs a suite of aurora commands +using a virtual cluster: + + ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh + +Creating a bundle for deployment +-------------------------------- +Gradle can create a zip file containing Aurora, all of its dependencies, and a launch script with + + ./gradlew distZip + +or a tar file containing the same files with + + ./gradlew distTar + +The output file will be written to `dist/distributions/aurora-scheduler.zip` or +`dist/distributions/aurora-scheduler.tar`. + + + +Developing Aurora Java code +=========================== + +Setting up an IDE +----------------- +Gradle can generate project files for your IDE. To generate an IntelliJ IDEA project run + + ./gradlew idea + +and import the generated `aurora.ipr` file. + +Adding or Upgrading a Dependency +-------------------------------- +New dependencies can be added from Maven central by adding a `compile` dependency to `build.gradle`. +For example, to add a dependency on `com.example`'s `example-lib` 1.0 add this block: + + compile 'com.example:example-lib:1.0' + +NOTE: Anyone thinking about adding a new dependency should first familiarize themselves with the +Apache Foundation's third-party licensing +[policy](http://www.apache.org/legal/resolved.html#category-x). + + + +Developing the Aurora Build System +================================== + +Bootstrapping Gradle +-------------------- +The following files were autogenerated by `gradle wrapper` using gradle's +[Wrapper](http://www.gradle.org/docs/current/dsl/org.gradle.api.tasks.wrapper.Wrapper.html) plugin and +should not be modified directly: + + ./gradlew + ./gradlew.bat + ./gradle/wrapper/gradle-wrapper.jar + ./gradle/wrapper/gradle-wrapper.properties + +To upgrade Gradle unpack the new version somewhere, run `/path/to/new/gradle wrapper` in the +repository root and commit the changed files. +
Added: aurora/site/source/documentation/latest/development/thermos.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/thermos.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/development/thermos.md (added) +++ aurora/site/source/documentation/latest/development/thermos.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,126 @@ +The Python components of Aurora are built using [Pants](https://pantsbuild.github.io). + + +Python Build Conventions +======================== +The Python code is laid out according to the following conventions: + +1. 1 `BUILD` per 3rd level directory. For a list of current top-level packages run: + + % find src/main/python -maxdepth 3 -mindepth 3 -type d |\ + while read dname; do echo $dname |\ + sed 's@src/main/python/\(.*\)/\(.*\)/\(.*\).*@\1.\2.\3@'; done + +2. Each `BUILD` file exports 1 + [`python_library`](https://pantsbuild.github.io/build_dictionary.html#bdict_python_library) + that provides a + [`setup_py`](https://pantsbuild.github.io/build_dictionary.html#setup_py) + containing each + [`python_binary`](https://pantsbuild.github.io/build_dictionary.html#python_binary) + in the `BUILD` file, named the same as the directory it's in so that it can be referenced + without a ':' character. The `sources` field in the `python_library` will almost always be + `rglobs('*.py')`. + +3. Other BUILD files may only depend on this single public `python_library` + target. Any other target is considered a private implementation detail and + should be prefixed with an `_`. + +4. `python_binary` targets are always named the same as the exported console script. + +5. `python_binary` targets must have identical `dependencies` to the `python_library` exported + by the package and must use `entry_point`. + + The means a PEX file generated by pants will contain exactly the same files that will be + available on the `PYTHONPATH` in the case of `pip install` of the corresponding library + target. This will help our migration off of Pants in the future. + +Annotated example - apache.thermos.runner +----------------------------------------- + + % find src/main/python/apache/thermos/runner + src/main/python/apache/thermos/runner + src/main/python/apache/thermos/runner/__init__.py + src/main/python/apache/thermos/runner/thermos_runner.py + src/main/python/apache/thermos/runner/BUILD + % cat src/main/python/apache/thermos/runner/BUILD + # License boilerplate omitted + import os + + + # Private target so that a setup_py can exist without a circular dependency. Only targets within + # this file should depend on this. + python_library( + name = '_runner', + # The target covers every python file under this directory and subdirectories. + sources = rglobs('*.py'), + dependencies = [ + '3rdparty/python:twitter.common.app', + '3rdparty/python:twitter.common.log', + # Source dependencies are always referenced without a ':'. + 'src/main/python/apache/thermos/common', + 'src/main/python/apache/thermos/config', + 'src/main/python/apache/thermos/core', + ], + ) + + # Binary target for thermos_runner.pex. Nothing should depend on this - it's only used as an + # argument to ./pants binary. + python_binary( + name = 'thermos_runner', + # Use entry_point, not source so the files used here are the same ones tests see. + entry_point = 'apache.thermos.bin.thermos_runner', + dependencies = [ + # Notice that we depend only on the single private target from this BUILD file here. + ':_runner', + ], + ) + + # The public library that everyone importing the runner symbols uses. + # The test targets and any other dependent source code should depend on this. + python_library( + name = 'runner', + dependencies = [ + # Again, notice that we depend only on the single private target from this BUILD file here. + ':_runner', + ], + # We always provide a setup_py. This will cause any dependee libraries to automatically + # reference this library in their requirements.txt rather than copy the source files into their + # sdist. + provides = setup_py( + # Conventionally named and versioned. + name = 'apache.thermos.runner', + version = open(os.path.join(get_buildroot(), '.auroraversion')).read().strip().upper(), + ).with_binaries({ + # Every binary in this file should also be repeated here. + # Always use the dict-form of .with_binaries so that commands with dashes in their names are + # supported. + # The console script name is always the same as the PEX with .pex stripped. + 'thermos_runner': ':thermos_runner', + }), + ) + + + +Thermos Test resources +====================== + +The Aurora source repository and distributions contain several +[binary files](../../src/test/resources/org/apache/thermos/root/checkpoints) to +qualify the backwards-compatibility of thermos with checkpoint data. Since +thermos persists state to disk, to be read by the thermos observer), it is important that we have +tests that prevent regressions affecting the ability to parse previously-written data. + +The files included represent persisted checkpoints that exercise different +features of thermos. The existing files should not be modified unless +we are accepting backwards incompatibility, such as with a major release. + +It is not practical to write source code to generate these files on the fly, +as source would be vulnerable to drift (e.g. due to refactoring) in ways +that would undermine the goal of ensuring backwards compatibility. + +The most common reason to add a new checkpoint file would be to provide +coverage for new thermos features that alter the data format. This is +accomplished by writing and running a +[job configuration](../reference/configuration.md) that exercises the feature, and +copying the checkpoint file from the sandbox directory, by default this is +`/var/run/thermos/checkpoints/<aurora task id>`. Added: aurora/site/source/documentation/latest/development/thrift.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/thrift.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/development/thrift.md (added) +++ aurora/site/source/documentation/latest/development/thrift.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,54 @@ +Thrift +====== + +Aurora uses [Apache Thrift](https://thrift.apache.org/) for representing structured data in +client/server RPC protocol as well as for internal data storage. While Thrift is capable of +correctly handling additions and renames of the existing members, field removals must be done +carefully to ensure backwards compatibility and provide predictable deprecation cycle. This +document describes general guidelines for making Thrift schema changes to the existing fields in +[api.thrift](../../api/src/main/thrift/org/apache/aurora/gen/api.thrift). + +It is highly recommended to go through the +[Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on +basic Thrift schema concepts. + +Checklist +--------- +Every existing Thrift schema modification is unique in its requirements and must be analyzed +carefully to identify its scope and expected consequences. The following checklist may help in that +analysis: +* Is this a new field/struct? If yes, go ahead +* Is this a pure field/struct rename without any type/structure change? If yes, go ahead and rename +* Anything else, read further to make sure your change is properly planned + +Deprecation cycle +----------------- +Any time a breaking change (e.g.: field replacement or removal) is required, the following cycle +must be followed: + +### vCurrent +Change is applied in a way that does not break scheduler/client with this version to +communicate with scheduler/client from vCurrent-1. +* Do not remove or rename the old field +* Add a new field as an eventual replacement of the old one and implement a dual read/write +anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both columns +are marked as `NOT NULL` +* Check [storage.thrift](../../api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if +the affected struct is stored in Aurora scheduler storage. If so, it's almost certainly also +necessary to perform a [DB migration](db-migration.md). +* Add a deprecation jira ticket into the vCurrent+1 release candidate +* Add a TODO for the deprecated field mentioning the jira ticket + +### vCurrent+1 +Finalize the change by removing the deprecated fields from the Thrift schema. +* Drop any dual read/write routines added in the previous version +* Remove thrift backfilling in scheduler +* Remove the deprecated Thrift field + +Testing +------- +It's always advisable to test your changes in the local vagrant environment to build more +confidence that you change is backwards compatible. It's easy to simulate different +client/scheduler versions by playing with `aurorabuild` command. See [this document](../getting-started/vagrant.md) +for more. + Added: aurora/site/source/documentation/latest/development/ui.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/ui.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/development/ui.md (added) +++ aurora/site/source/documentation/latest/development/ui.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,46 @@ +Developing the Aurora Scheduler UI +================================== + +Installing bower (optional) +---------------------------- +Third party JS libraries used in Aurora (located at 3rdparty/javascript/bower_components) are +managed by bower, a JS dependency manager. Bower is only required if you plan to add, remove or +update JS libraries. Bower can be installed using the following command: + + npm install -g bower + +Bower depends on node.js and npm. The easiest way to install node on a mac is via brew: + + brew install node + +For more node.js installation options refer to https://github.com/joyent/node/wiki/Installation. + +More info on installing and using bower can be found at: http://bower.io/. Once installed, you can +use the following commands to view and modify the bower repo at +3rdparty/javascript/bower_components + + bower list + bower install <library name> + bower remove <library name> + bower update <library name> + bower help + + +Faster Iteration in Vagrant +--------------------------- +The scheduler serves UI assets from the classpath. For production deployments this means the assets +are served from within a jar. However, for faster development iteration, the vagrant image is +configured to add the `scheduler` subtree of `/vagrant/dist/resources/main` to the head of +`CLASSPATH`. This path is configured as a shared filesystem to the path on the host system where +your Aurora repository lives. This means that any updates under `dist/resources/main/scheduler` in +your checkout will be reflected immediately in the UI served from within the vagrant image. + +The one caveat to this is that this path is under `dist` not `src`. This is because the assets must +be processed by gradle before they can be served. So, unfortunately, you cannot just save your local +changes and see them reflected in the UI, you must first run `./gradlew processResources`. This is +less than ideal, but better than having to restart the scheduler after every change. Additionally, +gradle makes this process somewhat easier with the use of the `--continuous` flag. If you run: +`./gradlew processResources --continuous` gradle will monitor the filesystem for changes and run the +task automatically as necessary. This doesn't quite provide hot-reload capabilities, but it does +allow for <5s from save to changes being visibile in the UI with no further action required on the +part of the developer. Added: aurora/site/source/documentation/latest/features/constraints.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/constraints.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/constraints.md (added) +++ aurora/site/source/documentation/latest/features/constraints.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,126 @@ +Scheduling Constraints +====================== + +By default, Aurora will pick any random slave with sufficient resources +in order to schedule a task. This scheduling choice can be further +restricted with the help of constraints. + + +Mesos Attributes +---------------- + +Data centers are often organized with hierarchical failure domains. Common failure domains +include hosts, racks, rows, and PDUs. If you have this information available, it is wise to tag +the Mesos slave with them as +[attributes](https://mesos.apache.org/documentation/attributes-resources/). + +The Mesos slave `--attributes` command line argument can be used to mark slaves with +static key/value pairs, so called attributes (not to be confused with `--resources`, which are +dynamic and accounted). + +For example, consider the host `cluster1-aaa-03-sr2` and its following attributes (given in +key:value format): `host:cluster1-aaa-03-sr2` and `rack:aaa`. + +Aurora makes these attributes available for matching with scheduling constraints. + + +Limit Constraints +----------------- + +Limit constraints allow to control machine diversity using constraints. The below +constraint ensures that no more than two instances of your job may run on a single host. +Think of this as a "group by" limit. + + Service( + name = 'webservice', + role = 'www-data', + constraints = { + 'host': 'limit:2', + } + ... + ) + + +Likewise, you can use constraints to control rack diversity, e.g. at +most one task per rack: + + constraints = { + 'rack': 'limit:1', + } + +Use these constraints sparingly as they can dramatically reduce Tasks' schedulability. +Further details are available in the reference documentation on +[Scheduling Constraints](../reference/configuration.md#specifying-scheduling-constraints). + + + +Value Constraints +----------------- + +Value constraints can be used to express that a certain attribute with a certain value +should be present on a Mesos slave. For example, the following job would only be +scheduled on nodes that claim to have an `SSD` as their disk. + + Service( + name = 'webservice', + role = 'www-data', + constraints = { + 'disk': 'SSD', + } + ... + ) + + +Further details are available in the reference documentation on +[Scheduling Constraints](../reference/configuration.md#specifying-scheduling-constraints). + + +Running stateful services +------------------------- + +Aurora is best suited to run stateless applications, but it also accommodates for stateful services +like databases, or services that otherwise need to always run on the same machines. + +### Dedicated attribute + +Most of the Mesos attributes arbitrary and available for custom use. There is one exception, +though: the `dedicated` attribute. Aurora treats this specially, and only allows matching jobs to +run on these machines, and will only schedule matching jobs on these machines. + + +#### Syntax +The dedicated attribute has semantic meaning. The format is `$role(/.*)?`. When a job is created, +the scheduler requires that the `$role` component matches the `role` field in the job +configuration, and will reject the job creation otherwise. The remainder of the attribute is +free-form. We've developed the idiom of formatting this attribute as `$role/$job`, but do not +enforce this. For example: a job `devcluster/www-data/prod/hello` with a dedicated constraint set as +`www-data/web.multi` will have its tasks scheduled only on Mesos slaves configured with: +`--attributes=dedicated:www-data/web.multi`. + +A wildcard (`*`) may be used for the role portion of the dedicated attribute, which will allow any +owner to elect for a job to run on the host(s). For example: tasks from both +`devcluster/www-data/prod/hello` and `devcluster/vagrant/test/hello` with a dedicated constraint +formatted as `*/web.multi` will be scheduled only on Mesos slaves configured with +`--attributes=dedicated:*/web.multi`. This may be useful when assembling a virtual cluster of +machines sharing the same set of traits or requirements. + +##### Example +Consider the following slave command line: + + mesos-slave --attributes="dedicated:db_team/redis" ... + +And this job configuration: + + Service( + name = 'redis', + role = 'db_team', + constraints = { + 'dedicated': 'db_team/redis' + } + ... + ) + +The job configuration is indicating that it should only be scheduled on slaves with the attribute +`dedicated:db_team/redis`. Additionally, Aurora will prevent any tasks that do _not_ have that +constraint from running on those slaves. + Added: aurora/site/source/documentation/latest/features/containers.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/containers.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/containers.md (added) +++ aurora/site/source/documentation/latest/features/containers.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,43 @@ +Containers +========== + + +Docker +------ + +Aurora has optional support for launching Docker containers, if correctly [configured by an Operator](../operations/configuration.md#docker-containers). + +Example (available in the [Vagrant environment](../getting-started/vagrant.md)): + + + $ cat /vagrant/examples/jobs/docker/hello_docker.aurora + hello_docker = Process( + name = 'hello', + cmdline = """ + while true; do + echo hello world + sleep 10 + done + """) + + hello_world_docker = Task( + name = 'hello docker', + processes = [hello_world_proc], + resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB) + ) + + jobs = [ + Service( + cluster = 'devcluster', + environment = 'devel', + role = 'docker-test', + name = 'hello_docker', + task = hello_world_docker, + container = Container(docker = Docker(image = 'python:2.7')) + ) + ] + + +In order to correctly execute processes inside a job, the docker container must have Python 2.7 +installed. Further details of how to use Docker can be found in the +[Reference Documentation](../reference/configuration.md#docker-object). Added: aurora/site/source/documentation/latest/features/cron-jobs.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/cron-jobs.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/cron-jobs.md (added) +++ aurora/site/source/documentation/latest/features/cron-jobs.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,124 @@ +# Cron Jobs + +Aurora supports execution of scheduled jobs on a Mesos cluster using cron-style syntax. + +- [Overview](#overview) +- [Collision Policies](#collision-policies) +- [Failure recovery](#failure-recovery) +- [Interacting with cron jobs via the Aurora CLI](#interacting-with-cron-jobs-via-the-aurora-cli) + - [cron schedule](#cron-schedule) + - [cron deschedule](#cron-deschedule) + - [cron start](#cron-start) + - [job killall, job restart, job kill](#job-killall-job-restart-job-kill) +- [Technical Note About Syntax](#technical-note-about-syntax) +- [Caveats](#caveats) + - [Failovers](#failovers) + - [Collision policy is best-effort](#collision-policy-is-best-effort) + - [Timezone Configuration](#timezone-configuration) + +## Overview + +A job is identified as a cron job by the presence of a +`cron_schedule` attribute containing a cron-style schedule in the +[`Job`](../reference/configuration.md#job-objects) object. Examples of cron schedules +include "every 5 minutes" (`*/5 * * * *`), "Fridays at 17:00" (`* 17 * * FRI`), and +"the 1st and 15th day of the month at 03:00" (`0 3 1,15 *`). + +Example (available in the [Vagrant environment](../getting-started/vagrant.md)): + + $ cat /vagrant/examples/jobs/cron_hello_world.aurora + # A cron job that runs every 5 minutes. + jobs = [ + Job( + cluster = 'devcluster', + role = 'www-data', + environment = 'test', + name = 'cron_hello_world', + cron_schedule = '*/5 * * * *', + task = SimpleTask( + 'cron_hello_world', + 'echo "Hello world from cron, the time is now $(date --rfc-822)"'), + ), + ] + +## Collision Policies + +The `cron_collision_policy` field specifies the scheduler's behavior when a new cron job is +triggered while an older run hasn't finished. The scheduler has two policies available: + +* `KILL_EXISTING`: The default policy - on a collision the old instances are killed and a instances with the current +configuration are started. +* `CANCEL_NEW`: On a collision the new run is cancelled. + +Note that the use of `CANCEL_NEW` is likely a code smell - interrupted cron jobs should be able +to recover their progress on a subsequent invocation, otherwise they risk having their work queue +grow faster than they can process it. + +## Failure recovery + +Unlike with services, which aurora will always re-execute regardless of exit status, instances of +cron jobs retry according to the `max_task_failures` attribute of the +[Task](../reference/configuration.md#task-object) object. To get "run-until-success" semantics, +set `max_task_failures` to `-1`. + +## Interacting with cron jobs via the Aurora CLI + +Most interaction with cron jobs takes place using the `cron` subcommand. See `aurora cron -h` +for up-to-date usage instructions. + +### cron schedule +Schedules a new cron job on the Aurora cluster for later runs or replaces the existing cron template +with a new one. Only future runs will be affected, any existing active tasks are left intact. + + $ aurora cron schedule devcluster/www-data/test/cron_hello_world /vagrant/examples/jobs/cron_hello_world.aurora + +### cron deschedule +Deschedules a cron job, preventing future runs but allowing current runs to complete. + + $ aurora cron deschedule devcluster/www-data/test/cron_hello_world + +### cron start +Start a cron job immediately, outside of its normal cron schedule. + + $ aurora cron start devcluster/www-data/test/cron_hello_world + +### job killall, job restart, job kill +Cron jobs create instances running on the cluster that you can interact with like normal Aurora +tasks with `job kill` and `job restart`. + + +## Technical Note About Syntax + +`cron_schedule` uses a restricted subset of BSD crontab syntax. While the +execution engine currently uses Quartz, the schedule parsing is custom, a subset of FreeBSD +[crontab(5)](http://www.freebsd.org/cgi/man.cgi?crontab(5)) syntax. See +[the source](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/cron/CrontabEntry.java#L106-L124) +for details. + + +## Caveats + +### Failovers +No failover recovery. Aurora does not record the latest minute it fired +triggers for across failovers. Therefore it's possible to miss triggers +on failover. Note that this behavior may change in the future. + +It's necessary to sync time between schedulers with something like `ntpd`. +Clock skew could cause double or missed triggers in the case of a failover. + +### Collision policy is best-effort +Aurora aims to always have *at least one copy* of a given instance running at a time - it's +an AP system, meaning it chooses Availability and Partition Tolerance at the expense of +Consistency. + +If your collision policy was `CANCEL_NEW` and a task has terminated but +Aurora has not noticed this Aurora will go ahead and create your new +task. + +If your collision policy was `KILL_EXISTING` and a task was marked `LOST` +but not yet GCed Aurora will go ahead and create your new task without +attempting to kill the old one (outside the GC interval). + +### Timezone Configuration +Cron timezone is configured indepdendently of JVM timezone with the `-cron_timezone` flag and +defaults to UTC. Added: aurora/site/source/documentation/latest/features/job-updates.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/job-updates.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/job-updates.md (added) +++ aurora/site/source/documentation/latest/features/job-updates.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,111 @@ +Aurora Job Updates +================== + +`Job` configurations can be updated at any point in their lifecycle. +Usually updates are done incrementally using a process called a *rolling +upgrade*, in which Tasks are upgraded in small groups, one group at a +time. Updates are done using various Aurora Client commands. + + +Rolling Job Updates +------------------- + +There are several sub-commands to manage job updates: + + aurora update start <job key> <configuration file> + aurora update info <job key> + aurora update pause <job key> + aurora update resume <job key> + aurora update abort <job key> + aurora update list <cluster> + +When you `start` a job update, the command will return once it has sent the +instructions to the scheduler. At that point, you may view detailed +progress for the update with the `info` subcommand, in addition to viewing +graphical progress in the web browser. You may also get a full listing of +in-progress updates in a cluster with `list`. + +Once an update has been started, you can `pause` to keep the update but halt +progress. This can be useful for doing things like debug a partially-updated +job to determine whether you would like to proceed. You can `resume` to +proceed. + +You may `abort` a job update regardless of the state it is in. This will +instruct the scheduler to completely abandon the job update and leave the job +in the current (possibly partially-updated) state. + +For a configuration update, the Aurora Client calculates required changes +by examining the current job config state and the new desired job config. +It then starts a *rolling batched update process* by going through every batch +and performing these operations: + +- If an instance is present in the scheduler but isn't in the new config, + then that instance is killed. +- If an instance is not present in the scheduler but is present in + the new config, then the instance is created. +- If an instance is present in both the scheduler and the new config, then + the client diffs both task configs. If it detects any changes, it + performs an instance update by killing the old config instance and adds + the new config instance. + +The Aurora client continues through the instance list until all tasks are +updated, in `RUNNING,` and healthy for a configurable amount of time. +If the client determines the update is not going well (a percentage of health +checks have failed), it cancels the update. + +Update cancellation runs a procedure similar to the described above +update sequence, but in reverse order. New instance configs are swapped +with old instance configs and batch updates proceed backwards +from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7, +8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0). + +For details how to control a job update, please see the +[UpdateConfig](../reference/configuration.md#updateconfig-objects) configuration object. + + +Coordinated Job Updates +------------------------ + +Some Aurora services may benefit from having more control over updates by explicitly +acknowledging ("heartbeating") job update progress. This may be helpful for mission-critical +service updates where explicit job health monitoring is vital during the entire job update +lifecycle. Such job updates would rely on an external service (or a custom client) periodically +pulsing an active coordinated job update via a +[pulseJobUpdate RPC](../../api/src/main/thrift/org/apache/aurora/gen/api.thrift). + +A coordinated update is defined by setting a positive +[pulse_interval_secs](../reference/configuration.md#updateconfig-objects) value in job configuration +file. If no pulses are received within specified interval the update will be blocked. A blocked +update is unable to continue rolling forward (or rolling back) but retains its active status. +It may only be unblocked by a fresh `pulseJobUpdate` call. + +NOTE: A coordinated update starts in `ROLL_FORWARD_AWAITING_PULSE` state and will not make any +progress until the first pulse arrives. However, a paused update (`ROLL_FORWARD_PAUSED` or +`ROLL_BACK_PAUSED`) is still considered active and upon resuming will immediately make progress +provided the pulse interval has not expired. + + +Canary Deployments +------------------ + +Canary deployments are a pattern for rolling out updates to a subset of job instances, +in order to test different code versions alongside the actual production job. +It is a risk-mitigation strategy for job owners and commonly used in a form where +job instance 0 runs with a different configuration than the instances 1-N. + +For example, consider a job with 4 instances that each +request 1 core of cpu, 1 GB of RAM, and 1 GB of disk space as specified +in the configuration file `hello_world.aurora`. If you want to +update it so it requests 2 GB of RAM instead of 1. You can create a new +configuration file to do that called `new_hello_world.aurora` and +issue + + aurora update start <job_key_value>/0-1 new_hello_world.aurora + +This results in instances 0 and 1 having 1 cpu, 2 GB of RAM, and 1 GB of disk space, +while instances 2 and 3 have 1 cpu, 1 GB of RAM, and 1 GB of disk space. If instance 3 +dies and restarts, it restarts with 1 cpu, 1 GB RAM, and 1 GB disk space. + +So that means there are two simultaneous task configurations for the same job +at the same time, just valid for different ranges of instances. While this isn't a recommended +pattern, it is valid and supported by the Aurora scheduler. Added: aurora/site/source/documentation/latest/features/multitenancy.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/multitenancy.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/multitenancy.md (added) +++ aurora/site/source/documentation/latest/features/multitenancy.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,62 @@ +Multitenancy +============ + +Aurora is a multi-tenant system that can run jobs of multiple clients/tenants. +Going beyond the [resource isolation on an individual host](resource-isolation.md), it is +crucial to prevent those jobs from stepping on each others toes. + + +Job Namespaces +-------------- + +The namespace for jobs in Aurora follows a hierarchical structure. This is meant to make it easier +to differentiate between different jobs. A job key consists of four parts. The four parts are +`<cluster>/<role>/<environment>/<jobname>` in that order: + +* Cluster refers to the name of a particular Aurora installation. +* Role names are user accounts. +* Environment names are namespaces. +* Jobname is the custom name of your job. + +Role names correspond to user accounts. They are used for +[authentication](../operations/security.md), as the linux user used to run jobs, and for the +assignment of [quota](#preemption). If you don't know what accounts are available, contact your +sysadmin. + +The environment component in the job key, serves as a namespace. The values for +environment are validated in the client and the scheduler so as to allow any of `devel`, `test`, +`production`, and any value matching the regular expression `staging[0-9]*`. + +None of the values imply any difference in the scheduling behavior. Conventionally, the +"environment" is set so as to indicate a certain level of stability in the behavior of the job +by ensuring that an appropriate level of testing has been performed on the application code. e.g. +in the case of a typical Job, releases may progress through the following phases in order of +increasing level of stability: `devel`, `test`, `staging`, `production`. + + +Preemption +---------- + +In order to guarantee that important production jobs are always running, Aurora supports +preemption. + +Let's consider we have a pending job that is candidate for scheduling but resource shortage pressure +prevents this. Active tasks can become the victim of preemption, if: + + - both candidate and victim are owned by the same role and the + [priority](../reference/configuration.md#job-objects) of a victim is lower than the + [priority](../reference/configuration.md#job-objects) of the candidate. + - OR a victim is non-[production](../reference/configuration.md#job-objects) and the candidate is + [production](../reference/configuration.md#job-objects). + +In other words, tasks from [production](../reference/configuration.md#job-objects) jobs may preempt +tasks from any non-production job. However, a production task may only be preempted by tasks from +production jobs in the same role with higher [priority](../reference/configuration.md#job-objects). + +Aurora requires resource quotas for [production non-dedicated jobs](../reference/configuration.md#job-objects). +Quota is enforced at the job role level and when set, defines a non-preemptible pool of compute resources within +that role. All job types (service, adhoc or cron) require role resource quota unless a job has +[dedicated constraint set](constraints.md#dedicated-attribute). + +To grant quota to a particular role in production, an operator can use the command +`aurora_admin set_quota`. Added: aurora/site/source/documentation/latest/features/resource-isolation.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/resource-isolation.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/resource-isolation.md (added) +++ aurora/site/source/documentation/latest/features/resource-isolation.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,167 @@ +Resources Isolation and Sizing +============================== + +- [Isolation](#isolation) +- [Sizing](#sizing) +- [Oversubscription](#oversubscription) + + +Isolation +--------- + +Aurora is a multi-tenant system; a single software instance runs on a +server, serving multiple clients/tenants. To share resources among +tenants, it implements isolation of: + +* CPU +* memory +* disk space + +CPU is a soft limit, and handled differently from memory and disk space. +Too low a CPU value results in throttling your application and +slowing it down. Memory and disk space are both hard limits; when your +application goes over these values, it's killed. + +### CPU Isolation + +Mesos uses a quota based CPU scheduler (the *Completely Fair Scheduler*) +to provide consistent and predictable performance. This is effectively +a guarantee of resources -- you receive at least what you requested, but +also no more than you've requested. + +The scheduler gives applications a CPU quota for every 100 ms interval. +When an application uses its quota for an interval, it is throttled for +the rest of the 100 ms. Usage resets for each interval and unused +quota does not carry over. + +For example, an application specifying 4.0 CPU has access to 400 ms of +CPU time every 100 ms. This CPU quota can be used in different ways, +depending on the application and available resources. Consider the +scenarios shown in this diagram. + + + +* *Scenario A*: the application can use up to 4 cores continuously for +every 100 ms interval. It is never throttled and starts processing +new requests immediately. + +* *Scenario B* : the application uses up to 8 cores (depending on +availability) but is throttled after 50 ms. The CPU quota resets at the +start of each new 100 ms interval. + +* *Scenario C* : is like Scenario A, but there is a garbage collection +event in the second interval that consumes all CPU quota. The +application throttles for the remaining 75 ms of that interval and +cannot service requests until the next interval. In this example, the +garbage collection finished in one interval but, depending on how much +garbage needs collecting, it may take more than one interval and further +delay service of requests. + +*Technical Note*: Mesos considers logical cores, also known as +hyperthreading or SMT cores, as the unit of CPU. + +### Memory Isolation + +Mesos uses dedicated memory allocation. Your application always has +access to the amount of memory specified in your configuration. The +application's memory use is defined as the sum of the resident set size +(RSS) of all processes in a shard. Each shard is considered +independently. + +In other words, say you specified a memory size of 10GB. Each shard +would receive 10GB of memory. If an individual shard's memory demands +exceed 10GB, that shard is killed, but the other shards continue +working. + +*Technical note*: Total memory size is not enforced at allocation time, +so your application can request more than its allocation without getting +an ENOMEM. However, it will be killed shortly after. + +### Disk Space + +Disk space used by your application is defined as the sum of the files' +disk space in your application's directory, including the `stdout` and +`stderr` logged from your application. Each shard is considered +independently. You should use off-node storage for your application's +data whenever possible. + +In other words, say you specified disk space size of 100MB. Each shard +would receive 100MB of disk space. If an individual shard's disk space +demands exceed 100MB, that shard is killed, but the other shards +continue working. + +After your application finishes running, its allocated disk space is +reclaimed. Thus, your job's final action should move any disk content +that you want to keep, such as logs, to your home file system or other +less transitory storage. Disk reclamation takes place an undefined +period after the application finish time; until then, the disk contents +are still available but you shouldn't count on them being so. + +*Technical note* : Disk space is not enforced at write so your +application can write above its quota without getting an ENOSPC, but it +will be killed shortly after. This is subject to change. + +### Other Resources + +Other resources, such as network bandwidth, do not have any performance +guarantees. For some resources, such as memory bandwidth, there are no +practical sharing methods so some application combinations collocated on +the same host may cause contention. + + +Sizing +------- + +### CPU Sizing + +To correctly size Aurora-run Mesos tasks, specify a per-shard CPU value +that lets the task run at its desired performance when at peak load +distributed across all shards. Include reserve capacity of at least 50%, +possibly more, depending on how critical your service is (or how +confident you are about your original estimate : -)), ideally by +increasing the number of shards to also improve resiliency. When running +your application, observe its CPU stats over time. If consistently at or +near your quota during peak load, you should consider increasing either +per-shard CPU or the number of shards. + +## Memory Sizing + +Size for your application's peak requirement. Observe the per-instance +memory statistics over time, as memory requirements can vary over +different periods. Remember that if your application exceeds its memory +value, it will be killed, so you should also add a safety margin of +around 10-20%. If you have the ability to do so, you may also want to +put alerts on the per-instance memory. + +## Disk Space Sizing + +Size for your application's peak requirement. Rotate and discard log +files as needed to stay within your quota. When running a Java process, +add the maximum size of the Java heap to your disk space requirement, in +order to account for an out of memory error dumping the heap +into the application's sandbox space. + + +Oversubscription +---------------- + +**WARNING**: This feature is currently in alpha status. Do not use it in production clusters! + +Mesos [supports a concept of revocable tasks](http://mesos.apache.org/documentation/latest/oversubscription/) +by oversubscribing machine resources by the amount deemed safe to not affect the existing +non-revocable tasks. Aurora now supports revocable jobs via a `tier` setting set to `revocable` +value. + +The Aurora scheduler must be configured to receive revocable offers from Mesos and accept revocable +jobs. If not configured properly revocable tasks will never get assigned to hosts and will stay in +`PENDING`. Set these scheduler flag to allow receiving revocable Mesos offers: + + -receive_revocable_resources=true + +Specify a tier configuration file path (unless you want to use the [default](../../src/main/resources/org/apache/aurora/scheduler/tiers.json)): + + -tier_config=path/to/tiers/config.json + + +See the [Configuration Reference](../references/configuration.md) for details on how to mark a job +as being revocable. Added: aurora/site/source/documentation/latest/features/service-discovery.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/service-discovery.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/service-discovery.md (added) +++ aurora/site/source/documentation/latest/features/service-discovery.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,44 @@ +Service Discovery +================= + +It is possible for the Aurora executor to announce tasks into ServerSets for +the purpose of service discovery. ServerSets use the Zookeeper [group membership pattern](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_outOfTheBox) +of which there are several reference implementations: + + - [C++](https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp) + - [Java](https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/ServerSetImpl.java#L221) + - [Python](https://github.com/twitter/commons/blob/master/src/python/twitter/common/zookeeper/serverset/serverset.py#L51) + +These can also be used natively in Finagle using the [ZookeeperServerSetCluster](https://github.com/twitter/finagle/blob/master/finagle-serversets/src/main/scala/com/twitter/finagle/zookeeper/ZookeeperServerSetCluster.scala). + +For more information about how to configure announcing, see the [Configuration Reference](../reference/configuration.md). + +Using Mesos DiscoveryInfo +------------------------- +Experimental support for populating DiscoveryInfo in Mesos is introduced in Aurora. This can be used to build +custom service discovery system not using zookeeper. Please see `Service Discovery` section in +[Mesos Framework Development guide](http://mesos.apache.org/documentation/latest/app-framework-development-guide/) for +explanation of the protobuf message in Mesos. + +To use this feature, please enable `--populate_discovery_info` flag on scheduler. All jobs started by scheduler +afterwards will have their portmap populated to Mesos and discoverable in `/state` endpoint in Mesos master and agent. + +### Using Mesos DNS +An example is using [Mesos-DNS](https://github.com/mesosphere/mesos-dns), which is able to generate multiple DNS +records. With current implementation, the example job with key `devcluster/vagrant/test/http-example` generates at +least the following: + +1. An A record for `http_example.test.vagrant.twitterscheduler.mesos` (which only includes IP address); +2. A [SRV record](https://en.wikipedia.org/wiki/SRV_record) for + `_http_example.test.vagrant._tcp.twitterscheduler.mesos`, which includes IP address and every port. This should only + be used if the service has one port. +3. A SRV record `_{port-name}._http_example.test.vagrant._tcp.twitterscheduler.mesos` for each port name + defined. This should be used when the service has multiple ports. + +Things to note: + +1. The domain part (".mesos" in above example) can be configured in [Mesos DNS](http://mesosphere.github.io/mesos-dns/docs/configuration-parameters.html); +2. The `twitterscheduler` part is the lower-case of framework name, which is not configurable right now (see + [TWITTER_SCHEDULER_NAME](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java#L98)); +3. Right now, portmap and port aliases in announcer object are not reflected in DiscoveryInfo, therefore not visible in + Mesos DNS records either. This is because they are only resolved in thermos executors. \ No newline at end of file Added: aurora/site/source/documentation/latest/features/services.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/services.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/services.md (added) +++ aurora/site/source/documentation/latest/features/services.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,99 @@ +Long-running Services +===================== + +Jobs that are always restart on completion, whether successful or unsuccessful, +are called services. This is useful for long-running processes +such as webservices that should always be running, unless stopped explicitly. + + +Service Specification +--------------------- + +A job is identified as a service by the presence of the flag +``service=True` in the [`Job`](../reference/configuration.md#job-objects) object. +The `Service` alias can be used as shorthand for `Job` with `service=True`. + +Example (available in the [Vagrant environment](../getting-started/vagrant.md)): + + $ cat /vagrant/examples/jobs/hello_world.aurora + hello = Process( + name = 'hello', + cmdline = """ + while true; do + echo hello world + sleep 10 + done + """) + + task = SequentialTask( + processes = [hello], + resources = Resources(cpu = 1.0, ram = 128*MB, disk = 128*MB) + ) + + jobs = [ + Service( + task = task, + cluster = 'devcluster', + role = 'www-data', + environment = 'prod', + name = 'hello' + ) + ] + + +Jobs without the service bit set only restart up to `max_task_failures` times and only if they +terminated unsuccessfully either due to human error or machine failure (see the +[`Job`](../reference/configuration.md#job-objects) object for details). + + +Ports +----- + +In order to be useful, most services have to bind to one or more ports. Aurora enables this +usecase via the [`thermos.ports` namespace](../reference/configuration.md#thermos-namespace) that +allows to request arbitrarily named ports: + + + nginx = Process( + name = 'nginx', + cmdline = './run_nginx.sh -port {{thermos.ports[http]}}' + ) + + +When this process is included in a job, the job will be allocated a port, and the command line +will be replaced with something like: + + ./run_nginx.sh -port 42816 + +Where 42816 happens to be the allocated port. + +For details on how to enable clients to discover this dynamically assigned port, see our +[Service Discovery](service-discovery.md) documentation. + + +Health Checking +--------------- + +Typically, the Thermos executor monitors processes within a task only by liveness of the forked +process. In addition to that, Aurora has support for rudimentary health checking: Either via HTTP +via custom shell scripts. + +For example, simply by requesting a `health` port, a process can request to be health checked +via repeated calls to the `/health` endpoint: + + nginx = Process( + name = 'nginx', + cmdline = './run_nginx.sh -port {{thermos.ports[health]}}' + ) + +Please see the +[configuration reference](../reference/configuration.md#user-content-healthcheckconfig-objects) +for configuration options for this feature. + +You can pause health checking by touching a file inside of your sandbox, named `.healthchecksnooze`. +As long as that file is present, health checks will be disabled, enabling users to gather core +dumps or other performance measurements without worrying about Aurora's health check killing +their process. + +WARNING: Remember to remove this when you are done, otherwise your instance will have permanently +disabled health checks. Added: aurora/site/source/documentation/latest/features/sla-metrics.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/sla-metrics.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/features/sla-metrics.md (added) +++ aurora/site/source/documentation/latest/features/sla-metrics.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,178 @@ +Aurora SLA Measurement +====================== + +- [Overview](#overview) +- [Metric Details](#metric-details) + - [Platform Uptime](#platform-uptime) + - [Job Uptime](#job-uptime) + - [Median Time To Assigned (MTTA)](#median-time-to-assigned-\(mtta\)) + - [Median Time To Running (MTTR)](#median-time-to-running-\(mttr\)) +- [Limitations](#limitations) + +## Overview + +The primary goal of the feature is collection and monitoring of Aurora job SLA (Service Level +Agreements) metrics that defining a contractual relationship between the Aurora/Mesos platform +and hosted services. + +The Aurora SLA feature is by default only enabled for service (non-cron) +production jobs (`"production=True"` in your `.aurora` config). It can be enabled for +non-production services by an operator via the scheduler command line flag `-sla_non_prod_metrics`. + +Counters that track SLA measurements are computed periodically within the scheduler. +The individual instance metrics are refreshed every minute (configurable via +`sla_stat_refresh_interval`). The instance counters are subsequently aggregated by +relevant grouping types before exporting to scheduler `/vars` endpoint (when using `vagrant` +that would be `http://192.168.33.7:8081/vars`) + + +## Metric Details + +### Platform Uptime + +*Aggregate amount of time a job spends in a non-runnable state due to platform unavailability +or scheduling delays. This metric tracks Aurora/Mesos uptime performance and reflects on any +system-caused downtime events (tasks LOST or DRAINED). Any user-initiated task kills/restarts +will not degrade this metric.* + +**Collection scope:** + +* Per job - `sla_<job_key>_platform_uptime_percent` +* Per cluster - `sla_cluster_platform_uptime_percent` + +**Units:** percent + +A fault in the task environment may cause the Aurora/Mesos to have different views on the task state +or lose track of the task existence. In such cases, the service task is marked as LOST and +rescheduled by Aurora. For example, this may happen when the task stays in ASSIGNED or STARTING +for too long or the Mesos slave becomes unhealthy (or disappears completely). The time between +task entering LOST and its replacement reaching RUNNING state is counted towards platform downtime. + +Another example of a platform downtime event is the administrator-requested task rescheduling. This +happens during planned Mesos slave maintenance when all slave tasks are marked as DRAINED and +rescheduled elsewhere. + +To accurately calculate Platform Uptime, we must separate platform incurred downtime from user +actions that put a service instance in a non-operational state. It is simpler to isolate +user-incurred downtime and treat all other downtime as platform incurred. + +Currently, a user can cause a healthy service (task) downtime in only two ways: via `killTasks` +or `restartShards` RPCs. For both, their affected tasks leave an audit state transition trail +relevant to uptime calculations. By applying a special "SLA meaning" to exposed task state +transition records, we can build a deterministic downtime trace for every given service instance. + +A task going through a state transition carries one of three possible SLA meanings +(see [SlaAlgorithm.java](../../src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java) for +sla-to-task-state mapping): + +* Task is UP: starts a period where the task is considered to be up and running from the Aurora + platform standpoint. + +* Task is DOWN: starts a period where the task cannot reach the UP state for some + non-user-related reason. Counts towards instance downtime. + +* Task is REMOVED from SLA: starts a period where the task is not expected to be UP due to + user initiated action or failure. We ignore this period for the uptime calculation purposes. + +This metric is recalculated over the last sampling period (last minute) to account for +any UP/DOWN/REMOVED events. It ignores any UP/DOWN events not immediately adjacent to the +sampling interval as well as adjacent REMOVED events. + +### Job Uptime + +*Percentage of the job instances considered to be in RUNNING state for the specified duration +relative to request time. This is a purely application side metric that is considering aggregate +uptime of all RUNNING instances. Any user- or platform initiated restarts directly affect +this metric.* + +**Collection scope:** We currently expose job uptime values at 5 pre-defined +percentiles (50th,75th,90th,95th and 99th): + +* `sla_<job_key>_job_uptime_50_00_sec` +* `sla_<job_key>_job_uptime_75_00_sec` +* `sla_<job_key>_job_uptime_90_00_sec` +* `sla_<job_key>_job_uptime_95_00_sec` +* `sla_<job_key>_job_uptime_99_00_sec` + +**Units:** seconds +You can also get customized real-time stats from aurora client. See `aurora sla -h` for +more details. + +### Median Time To Assigned (MTTA) + +*Median time a job spends waiting for its tasks to be assigned to a host. This is a combined +metric that helps track the dependency of scheduling performance on the requested resources +(user scope) as well as the internal scheduler bin-packing algorithm efficiency (platform scope).* + +**Collection scope:** + +* Per job - `sla_<job_key>_mtta_ms` +* Per cluster - `sla_cluster_mtta_ms` +* Per instance size (small, medium, large, x-large, xx-large). Size are defined in: +[ResourceAggregates.java](../../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java) + * By CPU: + * `sla_cpu_small_mtta_ms` + * `sla_cpu_medium_mtta_ms` + * `sla_cpu_large_mtta_ms` + * `sla_cpu_xlarge_mtta_ms` + * `sla_cpu_xxlarge_mtta_ms` + * By RAM: + * `sla_ram_small_mtta_ms` + * `sla_ram_medium_mtta_ms` + * `sla_ram_large_mtta_ms` + * `sla_ram_xlarge_mtta_ms` + * `sla_ram_xxlarge_mtta_ms` + * By DISK: + * `sla_disk_small_mtta_ms` + * `sla_disk_medium_mtta_ms` + * `sla_disk_large_mtta_ms` + * `sla_disk_xlarge_mtta_ms` + * `sla_disk_xxlarge_mtta_ms` + +**Units:** milliseconds + +MTTA only considers instances that have already reached ASSIGNED state and ignores those +that are still PENDING. This ensures straggler instances (e.g. with unreasonable resource +constraints) do not affect metric curves. + +### Median Time To Running (MTTR) + +*Median time a job waits for its tasks to reach RUNNING state. This is a comprehensive metric +reflecting on the overall time it takes for the Aurora/Mesos to start executing user content.* + +**Collection scope:** + +* Per job - `sla_<job_key>_mttr_ms` +* Per cluster - `sla_cluster_mttr_ms` +* Per instance size (small, medium, large, x-large, xx-large). Size are defined in: +[ResourceAggregates.java](../../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java) + * By CPU: + * `sla_cpu_small_mttr_ms` + * `sla_cpu_medium_mttr_ms` + * `sla_cpu_large_mttr_ms` + * `sla_cpu_xlarge_mttr_ms` + * `sla_cpu_xxlarge_mttr_ms` + * By RAM: + * `sla_ram_small_mttr_ms` + * `sla_ram_medium_mttr_ms` + * `sla_ram_large_mttr_ms` + * `sla_ram_xlarge_mttr_ms` + * `sla_ram_xxlarge_mttr_ms` + * By DISK: + * `sla_disk_small_mttr_ms` + * `sla_disk_medium_mttr_ms` + * `sla_disk_large_mttr_ms` + * `sla_disk_xlarge_mttr_ms` + * `sla_disk_xxlarge_mttr_ms` + +**Units:** milliseconds + +MTTR only considers instances in RUNNING state. This ensures straggler instances (e.g. with +unreasonable resource constraints) do not affect metric curves. + +## Limitations + +* The availability of Aurora SLA metrics is bound by the scheduler availability. + +* All metrics are calculated at a pre-defined interval (currently set at 1 minute). + Scheduler restarts may result in missed collections. Added: aurora/site/source/documentation/latest/getting-started/overview.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/overview.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/getting-started/overview.md (added) +++ aurora/site/source/documentation/latest/getting-started/overview.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,110 @@ +Aurora System Overview +====================== + +Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run +long-running services, cron jobs, and ad-hoc jobs that take advantage of Apache Mesos' scalability, +fault-tolerance, and resource isolation. + + +Components +---------- + +It is important to have an understanding of the components that make up +a functioning Aurora cluster. + + + +* **Aurora scheduler** + The scheduler is your primary interface to the work you run in your cluster. You will + instruct it to run jobs, and it will manage them in Mesos for you. You will also frequently use + the scheduler's read-only web interface as a heads-up display for what's running in your cluster. + +* **Aurora client** + The client (`aurora` command) is a command line tool that exposes primitives that you can use to + interact with the scheduler. The client operates on + + Aurora also provides an admin client (`aurora_admin` command) that contains commands built for + cluster administrators. You can use this tool to do things like manage user quotas and manage + graceful maintenance on machines in cluster. + +* **Aurora executor** + The executor (a.k.a. Thermos executor) is responsible for carrying out the workloads described in + the Aurora DSL (`.aurora` files). The executor is what actually executes user processes. It will + also perform health checking of tasks and register tasks in ZooKeeper for the purposes of dynamic + service discovery. + +* **Aurora observer** + The observer provides browser-based access to the status of individual tasks executing on worker + machines. It gives insight into the processes executing, and facilitates browsing of task sandbox + directories. + +* **ZooKeeper** + [ZooKeeper](http://zookeeper.apache.org) is a distributed consensus system. In an Aurora cluster + it is used for reliable election of the leading Aurora scheduler and Mesos master. It is also + used as a vehicle for service discovery, see [Service Discovery](../features/service-discovery.md) + +* **Mesos master** + The master is responsible for tracking worker machines and performing accounting of their + resources. The scheduler interfaces with the master to control the cluster. + +* **Mesos agent** + The agent receives work assigned by the scheduler and executes them. It interfaces with Linux + isolation systems like cgroups, namespaces and Docker to manage the resource consumption of tasks. + When a user task is launched, the agent will launch the executor (in the context of a Linux cgroup + or Docker container depending upon the environment), which will in turn fork user processes. + + +Jobs, Tasks and Processes +-------------------------- + +Aurora is a Mesos framework used to schedule *jobs* onto Mesos. Mesos +cares about individual *tasks*, but typical jobs consist of dozens or +hundreds of task replicas. Aurora provides a layer on top of Mesos with +its `Job` abstraction. An Aurora `Job` consists of a task template and +instructions for creating near-identical replicas of that task (modulo +things like "instance id" or specific port numbers which may differ from +machine to machine). + +How many tasks make up a Job is complicated. On a basic level, a Job consists of +one task template and instructions for creating near-identical replicas of that task +(otherwise referred to as "instances" or "shards"). + +A task can merely be a single *process* corresponding to a single +command line, such as `python2.7 my_script.py`. However, a task can also +consist of many separate processes, which all run within a single +sandbox. For example, running multiple cooperating agents together, +such as `logrotate`, `installer`, master, or slave processes. This is +where Thermos comes in. While Aurora provides a `Job` abstraction on +top of Mesos `Tasks`, Thermos provides a `Process` abstraction +underneath Mesos `Task`s and serves as part of the Aurora framework's +executor. + +You define `Job`s,` Task`s, and `Process`es in a configuration file. +Configuration files are written in Python, and make use of the +[Pystachio](https://github.com/wickman/pystachio) templating language, +along with specific Aurora, Mesos, and Thermos commands and methods. +The configuration files typically end with a `.aurora` extension. + +Summary: + +* Aurora manages jobs made of tasks. +* Mesos manages tasks made of processes. +* Thermos manages processes. +* All that is defined in `.aurora` configuration files + + + +Each `Task` has a *sandbox* created when the `Task` starts and garbage +collected when it finishes. All of a `Task'`s processes run in its +sandbox, so processes can share state by using a shared current working +directory. + +The sandbox garbage collection policy considers many factors, most +importantly age and size. It makes a best-effort attempt to keep +sandboxes around as long as possible post-task in order for service +owners to inspect data and logs, should the `Task` have completed +abnormally. But you can't design your applications assuming sandboxes +will be around forever, e.g. by building log saving or other +checkpointing mechanisms directly into your application or into your +`Job` description. + Added: aurora/site/source/documentation/latest/getting-started/tutorial.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/tutorial.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/getting-started/tutorial.md (added) +++ aurora/site/source/documentation/latest/getting-started/tutorial.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,258 @@ +# Aurora Tutorial + +This tutorial shows how to use the Aurora scheduler to run (and "`printf-debug`") +a hello world program on Mesos. This is the recommended document for new Aurora users +to start getting up to speed on the system. + +- [Prerequisite](#setup-install-aurora) +- [The Script](#the-script) +- [Aurora Configuration](#aurora-configuration) +- [Creating the Job](#creating-the-job) +- [Watching the Job Run](#watching-the-job-run) +- [Cleanup](#cleanup) +- [Next Steps](#next-steps) + + +## Prerequisite + +This tutorial assumes you are running [Aurora locally using Vagrant](vagrant.md). +However, in general the instructions are also applicable to any other +[Aurora installation](../operations/installation.md). + +Unless otherwise stated, all commands are to be run from the root of the aurora +repository clone. + + +## The Script + +Our "hello world" application is a simple Python script that loops +forever, displaying the time every few seconds. Copy the code below and +put it in a file named `hello_world.py` in the root of your Aurora repository clone +(Note: this directory is the same as `/vagrant` inside the Vagrant VMs). + +The script has an intentional bug, which we will explain later on. + +<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh. +--> +```python +import time + +def main(): + SLEEP_DELAY = 10 + # Python experts - ignore this blatant bug. + for i in xrang(100): + print("Hello world! The time is now: %s. Sleeping for %d secs" % ( + time.asctime(), SLEEP_DELAY)) + time.sleep(SLEEP_DELAY) + +if __name__ == "__main__": + main() +``` + +## Aurora Configuration + +Once we have our script/program, we need to create a *configuration +file* that tells Aurora how to manage and launch our Job. Save the below +code in the file `hello_world.aurora`. + +<!-- NOTE: If you are changing this file, be sure to also update examples/vagrant/test_tutorial.sh. +--> +```python +pkg_path = '/vagrant/hello_world.py' + +# we use a trick here to make the configuration change with +# the contents of the file, for simplicity. in a normal setting, packages would be +# versioned, and the version number would be changed in the configuration. +import hashlib +with open(pkg_path, 'rb') as f: + pkg_checksum = hashlib.md5(f.read()).hexdigest() + +# copy hello_world.py into the local sandbox +install = Process( + name = 'fetch_package', + cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, pkg_checksum)) + +# run the script +hello_world = Process( + name = 'hello_world', + cmdline = 'python -u hello_world.py') + +# describe the task +hello_world_task = SequentialTask( + processes = [install, hello_world], + resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB)) + +jobs = [ + Service(cluster = 'devcluster', + environment = 'devel', + role = 'www-data', + name = 'hello_world', + task = hello_world_task) +] +``` + +There is a lot going on in that configuration file: + +1. From a "big picture" viewpoint, it first defines two +Processes. Then it defines a Task that runs the two Processes in the +order specified in the Task definition, as well as specifying what +computational and memory resources are available for them. Finally, +it defines a Job that will schedule the Task on available and suitable +machines. This Job is the sole member of a list of Jobs; you can +specify more than one Job in a config file. + +2. At the Process level, it specifies how to get your code into the +local sandbox in which it will run. It then specifies how the code is +actually run once the second Process starts. + +For more about Aurora configuration files, see the [Configuration +Tutorial](../reference/configuration-tutorial.md) and the [Configuration +Reference](../reference/configuration.md) (preferably after finishing this +tutorial). + + +## Creating the Job + +We're ready to launch our job! To do so, we use the Aurora Client to +issue a Job creation request to the Aurora scheduler. + +Many Aurora Client commands take a *job key* argument, which uniquely +identifies a Job. A job key consists of four parts, each separated by a +"/". The four parts are `<cluster>/<role>/<environment>/<jobname>` +in that order: + +* Cluster refers to the name of a particular Aurora installation. +* Role names are user accounts existing on the slave machines. If you +don't know what accounts are available, contact your sysadmin. +* Environment names are namespaces; you can count on `test`, `devel`, +`staging` and `prod` existing. +* Jobname is the custom name of your job. + +When comparing two job keys, if any of the four parts is different from +its counterpart in the other key, then the two job keys identify two separate +jobs. If all four values are identical, the job keys identify the same job. + +The `clusters.json` [client configuration](../reference/client-cluster-configuration.md) +for the Aurora scheduler defines the available cluster names. +For Vagrant, from the top-level of your Aurora repository clone, do: + + $ vagrant ssh + +Followed by: + + vagrant@aurora:~$ cat /etc/aurora/clusters.json + +You'll see something like the following. The `name` value shown here, corresponds to a job key's cluster value. + +```javascript +[{ + "name": "devcluster", + "zk": "192.168.33.7", + "scheduler_zk_path": "/aurora/scheduler", + "auth_mechanism": "UNAUTHENTICATED", + "slave_run_directory": "latest", + "slave_root": "/var/lib/mesos" +}] +``` + +The Aurora Client command that actually runs our Job is `aurora job create`. It creates a Job as +specified by its job key and configuration file arguments and runs it. + + aurora job create <cluster>/<role>/<environment>/<jobname> <config_file> + +Or for our example: + + aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora + +After entering our virtual machine using `vagrant ssh`, this returns: + + vagrant@aurora:~$ aurora job create devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora + INFO] Creating job hello_world + INFO] Checking status of devcluster/www-data/devel/hello_world + Job create succeeded: job url=http://aurora.local:8081/scheduler/www-data/devel/hello_world + + +## Watching the Job Run + +Now that our job is running, let's see what it's doing. Access the +scheduler web interface at `http://$scheduler_hostname:$scheduler_port/scheduler` +Or when using `vagrant`, `http://192.168.33.7:8081/scheduler` +First we see what Jobs are scheduled: + + + +Click on your user name, which in this case was `www-data`, and we see the Jobs associated +with that role: + + + +If you click on your `hello_world` Job, you'll see: + + + +Oops, looks like our first job didn't quite work! The task is temporarily throttled for +having failed on every attempt of the Aurora scheduler to run it. We have to figure out +what is going wrong. + +On the Completed tasks tab, we see all past attempts of the Aurora scheduler to run our job. + + + +We can navigate to the Task page of a failed run by clicking on the host link. + + + +Once there, we see that the `hello_world` process failed. The Task page +captures the standard error and standard output streams and makes them available. +Clicking through to `stderr` on the failed `hello_world` process, we see what happened. + + + +It looks like we made a typo in our Python script. We wanted `xrange`, +not `xrang`. Edit the `hello_world.py` script to use the correct function +and save it as `hello_world_v2.py`. Then update the `hello_world.aurora` +configuration to the newest version. + +In order to try again, we can now instruct the scheduler to update our job: + + vagrant@aurora:~$ aurora update start devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora + INFO] Starting update for: hello_world + Job update has started. View your update progress at http://aurora.local:8081/scheduler/www-data/devel/hello_world/update/8ef38017-e60f-400d-a2f2-b5a8b724e95b + +This time, the task comes up. + + + +By again clicking on the host, we inspect the Task page, and see that the +`hello_world` process is running. + + + +We then inspect the output by clicking on `stdout` and see our process' +output: + + + +## Cleanup + +Now that we're done, we kill the job using the Aurora client: + + vagrant@aurora:~$ aurora job killall devcluster/www-data/devel/hello_world + INFO] Killing tasks for job: devcluster/www-data/devel/hello_world + INFO] Instances to be killed: [0] + Successfully killed instances [0] + Job killall succeeded + +The job page now shows the `hello_world` tasks as completed. + + + +## Next Steps + +Now that you've finished this Tutorial, you should read or do the following: + +- [The Aurora Configuration Tutorial](../reference/configuration-tutorial.md), which provides more examples + and best practices for writing Aurora configurations. You should also look at + the [Aurora Configuration Reference](../reference/configuration.md). +- Explore the Aurora Client - use `aurora -h`, and read the + [Aurora Client Commands](../reference/client-commands.md) document. Added: aurora/site/source/documentation/latest/getting-started/vagrant.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/vagrant.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/getting-started/vagrant.md (added) +++ aurora/site/source/documentation/latest/getting-started/vagrant.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,147 @@ +A local Cluster with Vagrant +============================ + +This document shows you how to configure a complete cluster using a virtual machine. This setup +replicates a real cluster in your development machine as closely as possible. After you complete +the steps outlined here, you will be ready to create and run your first Aurora job. + +The following sections describe these steps in detail: + +1. [Overview](#user-content-overview) +1. [Install VirtualBox and Vagrant](#user-content-install-virtualbox-and-vagrant) +1. [Clone the Aurora repository](#user-content-clone-the-aurora-repository) +1. [Start the local cluster](#user-content-start-the-local-cluster) +1. [Log onto the VM](#user-content-log-onto-the-vm) +1. [Run your first job](#user-content-run-your-first-job) +1. [Rebuild components](#user-content-rebuild-components) +1. [Shut down or delete your local cluster](#user-content-shut-down-or-delete-your-local-cluster) +1. [Troubleshooting](#user-content-troubleshooting) + + +Overview +-------- + +The Aurora distribution includes a set of scripts that enable you to create a local cluster in +your development machine. These scripts use [Vagrant](https://www.vagrantup.com/) and +[VirtualBox](https://www.virtualbox.org/) to run and configure a virtual machine. Once the +virtual machine is running, the scripts install and initialize Aurora and any required components +to create the local cluster. + + +Install VirtualBox and Vagrant +------------------------------ + +First, download and install [VirtualBox](https://www.virtualbox.org/) on your development machine. + +Then download and install [Vagrant](https://www.vagrantup.com/). To verify that the installation +was successful, open a terminal window and type the `vagrant` command. You should see a list of +common commands for this tool. + + +Clone the Aurora repository +--------------------------- + +To obtain the Aurora source distribution, clone its Git repository using the following command: + + git clone git://git.apache.org/aurora.git + + +Start the local cluster +----------------------- + +Now change into the `aurora/` directory, which contains the Aurora source code and +other scripts and tools: + + cd aurora/ + +To start the local cluster, type the following command: + + vagrant up + +This command uses the configuration scripts in the Aurora distribution to: + +* Download a Linux system image. +* Start a virtual machine (VM) and configure it. +* Install the required build tools on the VM. +* Install Aurora's requirements (like [Mesos](http://mesos.apache.org/) and +[Zookeeper](http://zookeeper.apache.org/)) on the VM. +* Build and install Aurora from source on the VM. +* Start Aurora's services on the VM. + +This process takes several minutes to complete. + +You may notice a warning that guest additions in the VM don't match your version of VirtualBox. +This should generally be harmless, but you may wish to install a vagrant plugin to take care of +mismatches like this for you: + + vagrant plugin install vagrant-vbguest + +With this plugin installed, whenever you `vagrant up` the plugin will upgrade the guest additions +for you when a version mis-match is detected. You can read more about the plugin +[here](https://github.com/dotless-de/vagrant-vbguest). + +To verify that Aurora is running on the cluster, visit the following URLs: + +* Scheduler - http://192.168.33.7:8081 +* Observer - http://192.168.33.7:1338 +* Mesos Master - http://192.168.33.7:5050 +* Mesos Slave - http://192.168.33.7:5051 + + +Log onto the VM +--------------- + +To SSH into the VM, run the following command in your development machine: + + vagrant ssh + +To verify that Aurora is installed in the VM, type the `aurora` command. You should see a list +of arguments and possible commands. + +The `/vagrant` directory on the VM is mapped to the `aurora/` local directory +from which you started the cluster. You can edit files inside this directory in your development +machine and access them from the VM under `/vagrant`. + +A pre-installed `clusters.json` file refers to your local cluster as `devcluster`, which you +will use in client commands. + + +Run your first job +------------------ + +Now that your cluster is up and running, you are ready to define and run your first job in Aurora. +For more information, see the [Aurora Tutorial](tutorial.md). + + +Rebuild components +------------------ + +If you are changing Aurora code and would like to rebuild a component, you can use the `aurorabuild` +command on the VM to build and restart a component. This is considerably faster than destroying +and rebuilding your VM. + +`aurorabuild` accepts a list of components to build and update. To get a list of supported +components, invoke the `aurorabuild` command with no arguments: + + vagrant ssh -c 'aurorabuild client' + + +Shut down or delete your local cluster +-------------------------------------- + +To shut down your local cluster, run the `vagrant halt` command in your development machine. To +start it again, run the `vagrant up` command. + +Once you are finished with your local cluster, or if you would otherwise like to start from scratch, +you can use the command `vagrant destroy` to turn off and delete the virtual file system. + + +Troubleshooting +--------------- + +Most of the vagrant related problems can be fixed by the following steps: + +* Destroying the vagrant environment with `vagrant destroy` +* Killing any orphaned VMs (see AURORA-499) with `virtualbox` UI or `VBoxManage` command line tool +* Cleaning the repository of build artifacts and other intermediate output with `git clean -fdx` +* Bringing up the vagrant environment with `vagrant up` Modified: aurora/site/source/documentation/latest/index.html.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/index.html.md?rev=1739402&r1=1739401&r2=1739402&view=diff ============================================================================== --- aurora/site/source/documentation/latest/index.html.md (original) +++ aurora/site/source/documentation/latest/index.html.md Sat Apr 16 04:23:06 2016 @@ -1,44 +1,73 @@ ## Introduction -Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run long-running services that take advantage of Apache Mesos' scalability, fault-tolerance, and resource isolation. This documentation has been organized into sections with three audiences in mind: - * Users: General information about the project and to learn how to run an Aurora job. - * Operators: For those that wish to manage and fine-tune an Aurora cluster. - * Developers: All the information you need to start modifying Aurora and contributing back to the project. - -We encourage you to ask questions on the [Aurora user list](http://aurora.apache.org/community/) or the `#aurora` IRC channel on `irc.freenode.net`. - -## Users - * [Install Aurora on virtual machines on your private machine](/documentation/latest/vagrant/) - * [Hello World Tutorial](/documentation/latest/tutorial/) - * [User Guide](/documentation/latest/user-guide/) - * [Configuration Tutorial](/documentation/latest/configuration-tutorial/) - * [Aurora + Thermos Reference](/documentation/latest/configuration-reference/) - * [Command Line Client](/documentation/latest/client-commands/) - * [Client cluster configuration](/documentation/latest/client-cluster-configuration/) - * [Cron Jobs](/documentation/latest/cron-jobs/) +Apache Aurora is a service scheduler that runs on top of Apache Mesos, enabling you to run +long-running services, cron jobs, and ad-hoc jobs that take advantage of Apache Mesos' scalability, +fault-tolerance, and resource isolation. + +We encourage you to ask questions on the [Aurora user list](http://aurora.apache.org/community/) or +the `#aurora` IRC channel on `irc.freenode.net`. + + +## Getting Started +Information for everyone new to Apache Aurora. + + * [Aurora System Overview](getting-started/overview.md) + * [Hello World Tutorial](getting-started/tutorial.md) + * [Local cluster with Vagrant](getting-started/vagrant.md) + +## Features +Description of important Aurora features. + + * [Containers](features/containers.md) + * [Cron Jobs](features/cron-jobs.md) + * [Job Updates](features/job-updates.md) + * [Multitenancy](features/multitenancy.md) + * [Resource Isolation](features/resource-isolation.md) + * [Scheduling Constraints](features/constraints.md) + * [Services](features/services.md) + * [Service Discovery](features/service-discovery.md) + * [SLA Metrics](features/sla-metrics.md) ## Operators - * [Installation](/documentation/latest/installing/) - * [Deployment and cluster configuration](/documentation/latest/deploying-aurora-scheduler/) - * [Security](/documentation/latest/security/) - * [Monitoring](/documentation/latest/monitoring/) - * [Hooks for Aurora Client API](/documentation/latest/hooks/) - * [Scheduler Storage](/documentation/latest/storage/) - * [Scheduler Storage and Maintenance](/documentation/latest/storage-config/) - * [SLA Measurement](/documentation/latest/sla/) - * [Resource Isolation and Sizing](/documentation/latest/resources/) +For those that wish to manage and fine-tune an Aurora cluster. + + * [Installation](operations/installation.md) + * [Configuration](operations/configuration.md) + * [Monitoring](operations/monitoring.md) + * [Security](operations/security.md) + * [Storage](operations/storage.md) + * [Backup](operations/backup-restore.md) + +## Reference +The complete reference of commands, configuration options, and scheduler internals. + + * [Task lifecycle](reference/task-lifecycle.md) + * Configuration (`.aurora` files) + - [Configuration Reference](reference/configuration.md) + - [Configuration Tutorial](reference/configuration-tutorial.md) + - [Configuration Best Practices](reference/configuration-best-practices.md) + - [Configuration Templating](reference/configuration-templating.md) + * Aurora Client + - [Client Commands](reference/client-commands.md) + - [Client Hooks](reference/client-hooks.md) + - [Client Cluster Configuration](reference/client-cluster-configuration.md) + * [Scheduler Configuration](reference/scheduler-configuration.md) + +## Additional Resources + * [Tools integrating with Aurora](additional-resources/tools.md) + * [Presentation videos and slides](additional-resources/presentations.md) ## Developers +All the information you need to start modifying Aurora and contributing back to the project. + * [Contributing to the project](contributing/) - * [Developing the Aurora Scheduler](/documentation/latest/developing-aurora-scheduler/) - * [Developing the Aurora Client](/documentation/latest/developing-aurora-client/) - * [Committers Guide](/documentation/latest/committers/) - * [Design Documents](/documentation/latest/design-documents/) - * [Deprecation Guide](/documentation/latest/thrift-deprecation/) - * [Build System](/documentation/latest/build-system/) - * [Generating test resources](/documentation/latest/test-resource-generation/) + * [Committer's Guide](development/committers-guide.md) + * [Design Documents](development/design-documents.md) + * Developing the Aurora components: + - [Client](development/client.md) + - [Scheduler](development/scheduler.md) + - [Scheduler UI](development/ui.md) + - [Thermos](development/thermos.md) + - [Thrift structures](development/thrift.md) -## Additional Resources - * [Tools integrating with Aurora](/documentation/latest/tools/) - * [Presentation videos and slides](/documentation/latest/presentations/)
