t...

jfarrell Fri, 15 Apr 2016 21:23:42 -0700

Added: aurora/site/source/documentation/latest/development/scheduler.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/scheduler.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/development/scheduler.md (added)
+++ aurora/site/source/documentation/latest/development/scheduler.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,118 @@
+Developing the Aurora Scheduler
+===============================
+
+The Aurora scheduler is written in Java code and built with 
[Gradle](http://gradle.org).
+
+
+Prerequisite
+============
+
+When using Apache Aurora checked out from the source repository or the binary
+distribution, the Gradle wrapper and JavaScript dependencies are provided.
+However, you need to manually install them when using the source release
+downloads:
+
+1. Install Gradle following the instructions on the [Gradle web 
site](http://gradle.org)
+2. From the root directory of the Apache Aurora project generate the Gradle
+wrapper by running:
+
+    gradle wrapper
+
+
+Getting Started
+===============
+
+You will need Java 8 installed and on your `PATH` or unzipped somewhere with 
`JAVA_HOME` set. Then
+
+    ./gradlew tasks
+
+will bootstrap the build system and show available tasks. This can take a 
while the first time you
+run it but subsequent runs will be much faster due to cached artifacts.
+
+Running the Tests
+-----------------
+Aurora has a comprehensive unit test suite. To run the tests use
+
+    ./gradlew build
+
+Gradle will only re-run tests when dependencies of them have changed. To force 
a re-run of all
+tests use
+
+    ./gradlew clean build
+
+Running the build with code quality checks
+------------------------------------------
+To speed up development iteration, the plain gradle commands will not run 
static analysis tools.
+However, you should run these before posting a review diff, and **always** run 
this before pushing a
+commit to origin/master.
+
+    ./gradlew build -Pq
+
+Running integration tests
+-------------------------
+To run the same tests that are run in the Apache Aurora continuous integration
+environment:
+
+    ./build-support/jenkins/build.sh
+
+In addition, there is an end-to-end test that runs a suite of aurora commands
+using a virtual cluster:
+
+    ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
+
+Creating a bundle for deployment
+--------------------------------
+Gradle can create a zip file containing Aurora, all of its dependencies, and a 
launch script with
+
+    ./gradlew distZip
+
+or a tar file containing the same files with
+
+    ./gradlew distTar
+
+The output file will be written to `dist/distributions/aurora-scheduler.zip` or
+`dist/distributions/aurora-scheduler.tar`.
+
+
+
+Developing Aurora Java code
+===========================
+
+Setting up an IDE
+-----------------
+Gradle can generate project files for your IDE. To generate an IntelliJ IDEA 
project run
+
+    ./gradlew idea
+
+and import the generated `aurora.ipr` file.
+
+Adding or Upgrading a Dependency
+--------------------------------
+New dependencies can be added from Maven central by adding a `compile` 
dependency to `build.gradle`.
+For example, to add a dependency on `com.example`'s `example-lib` 1.0 add this 
block:
+
+    compile 'com.example:example-lib:1.0'
+
+NOTE: Anyone thinking about adding a new dependency should first familiarize 
themselves with the
+Apache Foundation's third-party licensing
+[policy](http://www.apache.org/legal/resolved.html#category-x).
+
+
+
+Developing the Aurora Build System
+==================================
+
+Bootstrapping Gradle
+--------------------
+The following files were autogenerated by `gradle wrapper` using gradle's
+[Wrapper](http://www.gradle.org/docs/current/dsl/org.gradle.api.tasks.wrapper.Wrapper.html)
 plugin and
+should not be modified directly:
+
+    ./gradlew
+    ./gradlew.bat
+    ./gradle/wrapper/gradle-wrapper.jar
+    ./gradle/wrapper/gradle-wrapper.properties
+
+To upgrade Gradle unpack the new version somewhere, run `/path/to/new/gradle 
wrapper` in the
+repository root and commit the changed files.
+


Added: aurora/site/source/documentation/latest/development/thermos.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/thermos.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/development/thermos.md (added)
+++ aurora/site/source/documentation/latest/development/thermos.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,126 @@
+The Python components of Aurora are built using 
[Pants](https://pantsbuild.github.io).
+
+
+Python Build Conventions
+========================
+The Python code is laid out according to the following conventions:
+
+1. 1 `BUILD` per 3rd level directory. For a list of current top-level packages 
run:
+
+        % find src/main/python -maxdepth 3 -mindepth 3 -type d |\
+        while read dname; do echo $dname |\
+            sed 's@src/main/python/\(.*\)/\(.*\)/\(.*\).*@\1.\2.\3@'; done
+
+2.  Each `BUILD` file exports 1
+    
[`python_library`](https://pantsbuild.github.io/build_dictionary.html#bdict_python_library)
+    that provides a
+    [`setup_py`](https://pantsbuild.github.io/build_dictionary.html#setup_py)
+    containing each
+    
[`python_binary`](https://pantsbuild.github.io/build_dictionary.html#python_binary)
+    in the `BUILD` file, named the same as the directory it's in so that it 
can be referenced
+    without a ':' character. The `sources` field in the `python_library` will 
almost always be
+    `rglobs('*.py')`.
+
+3.  Other BUILD files may only depend on this single public `python_library`
+    target. Any other target is considered a private implementation detail and
+    should be prefixed with an `_`.
+
+4.  `python_binary` targets are always named the same as the exported console 
script.
+
+5.  `python_binary` targets must have identical `dependencies` to the 
`python_library` exported
+    by the package and must use `entry_point`.
+
+    The means a PEX file generated by pants will contain exactly the same 
files that will be
+    available on the `PYTHONPATH` in the case of `pip install` of the 
corresponding library
+    target. This will help our migration off of Pants in the future.
+
+Annotated example - apache.thermos.runner
+-----------------------------------------
+
+    % find src/main/python/apache/thermos/runner
+    src/main/python/apache/thermos/runner
+    src/main/python/apache/thermos/runner/__init__.py
+    src/main/python/apache/thermos/runner/thermos_runner.py
+    src/main/python/apache/thermos/runner/BUILD
+    % cat src/main/python/apache/thermos/runner/BUILD
+    # License boilerplate omitted
+    import os
+
+
+    # Private target so that a setup_py can exist without a circular 
dependency. Only targets within
+    # this file should depend on this.
+    python_library(
+      name = '_runner',
+      # The target covers every python file under this directory and 
subdirectories.
+      sources = rglobs('*.py'),
+      dependencies = [
+        '3rdparty/python:twitter.common.app',
+        '3rdparty/python:twitter.common.log',
+        # Source dependencies are always referenced without a ':'.
+        'src/main/python/apache/thermos/common',
+        'src/main/python/apache/thermos/config',
+        'src/main/python/apache/thermos/core',
+      ],
+    )
+
+    # Binary target for thermos_runner.pex. Nothing should depend on this - 
it's only used as an
+    # argument to ./pants binary.
+    python_binary(
+      name = 'thermos_runner',
+      # Use entry_point, not source so the files used here are the same ones 
tests see.
+      entry_point = 'apache.thermos.bin.thermos_runner',
+      dependencies = [
+        # Notice that we depend only on the single private target from this 
BUILD file here.
+        ':_runner',
+      ],
+    )
+
+    # The public library that everyone importing the runner symbols uses.
+    # The test targets and any other dependent source code should depend on 
this.
+    python_library(
+      name = 'runner',
+      dependencies = [
+        # Again, notice that we depend only on the single private target from 
this BUILD file here.
+        ':_runner',
+      ],
+      # We always provide a setup_py. This will cause any dependee libraries 
to automatically
+      # reference this library in their requirements.txt rather than copy the 
source files into their
+      # sdist.
+      provides = setup_py(
+        # Conventionally named and versioned.
+        name = 'apache.thermos.runner',
+        version = open(os.path.join(get_buildroot(), 
'.auroraversion')).read().strip().upper(),
+      ).with_binaries({
+        # Every binary in this file should also be repeated here.
+        # Always use the dict-form of .with_binaries so that commands with 
dashes in their names are
+        # supported.
+        # The console script name is always the same as the PEX with .pex 
stripped.
+        'thermos_runner': ':thermos_runner',
+      }),
+    )
+
+
+
+Thermos Test resources
+======================
+
+The Aurora source repository and distributions contain several
+[binary files](../../src/test/resources/org/apache/thermos/root/checkpoints) to
+qualify the backwards-compatibility of thermos with checkpoint data. Since
+thermos persists state to disk, to be read by the thermos observer), it is 
important that we have
+tests that prevent regressions affecting the ability to parse 
previously-written data.
+
+The files included represent persisted checkpoints that exercise different
+features of thermos. The existing files should not be modified unless
+we are accepting backwards incompatibility, such as with a major release.
+
+It is not practical to write source code to generate these files on the fly,
+as source would be vulnerable to drift (e.g. due to refactoring) in ways
+that would undermine the goal of ensuring backwards compatibility.
+
+The most common reason to add a new checkpoint file would be to provide
+coverage for new thermos features that alter the data format. This is
+accomplished by writing and running a
+[job configuration](../reference/configuration.md) that exercises the feature, 
and
+copying the checkpoint file from the sandbox directory, by default this is
+`/var/run/thermos/checkpoints/<aurora task id>`.

Added: aurora/site/source/documentation/latest/development/thrift.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/thrift.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/development/thrift.md (added)
+++ aurora/site/source/documentation/latest/development/thrift.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,54 @@
+Thrift
+======
+
+Aurora uses [Apache Thrift](https://thrift.apache.org/) for representing 
structured data in
+client/server RPC protocol as well as for internal data storage. While Thrift 
is capable of
+correctly handling additions and renames of the existing members, field 
removals must be done
+carefully to ensure backwards compatibility and provide predictable 
deprecation cycle. This
+document describes general guidelines for making Thrift schema changes to the 
existing fields in
+[api.thrift](../../api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+
+It is highly recommended to go through the
+[Thrift: The Missing 
Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on
+basic Thrift schema concepts.
+
+Checklist
+---------
+Every existing Thrift schema modification is unique in its requirements and 
must be analyzed
+carefully to identify its scope and expected consequences. The following 
checklist may help in that
+analysis:
+* Is this a new field/struct? If yes, go ahead
+* Is this a pure field/struct rename without any type/structure change? If 
yes, go ahead and rename
+* Anything else, read further to make sure your change is properly planned
+
+Deprecation cycle
+-----------------
+Any time a breaking change (e.g.: field replacement or removal) is required, 
the following cycle
+must be followed:
+
+### vCurrent
+Change is applied in a way that does not break scheduler/client with this 
version to
+communicate with scheduler/client from vCurrent-1.
+* Do not remove or rename the old field
+* Add a new field as an eventual replacement of the old one and implement a 
dual read/write
+anywhere the old field is used. If a thrift struct is mapped in the DB store 
make sure both columns
+are marked as `NOT NULL`
+* Check 
[storage.thrift](../../api/src/main/thrift/org/apache/aurora/gen/storage.thrift)
 to see if
+the affected struct is stored in Aurora scheduler storage. If so, it's almost 
certainly also
+necessary to perform a [DB migration](db-migration.md).
+* Add a deprecation jira ticket into the vCurrent+1 release candidate
+* Add a TODO for the deprecated field mentioning the jira ticket
+
+### vCurrent+1
+Finalize the change by removing the deprecated fields from the Thrift schema.
+* Drop any dual read/write routines added in the previous version
+* Remove thrift backfilling in scheduler
+* Remove the deprecated Thrift field
+
+Testing
+-------
+It's always advisable to test your changes in the local vagrant environment to 
build more
+confidence that you change is backwards compatible. It's easy to simulate 
different
+client/scheduler versions by playing with `aurorabuild` command. See [this 
document](../getting-started/vagrant.md)
+for more.
+

Added: aurora/site/source/documentation/latest/development/ui.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/development/ui.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/development/ui.md (added)
+++ aurora/site/source/documentation/latest/development/ui.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,46 @@
+Developing the Aurora Scheduler UI
+==================================
+
+Installing bower (optional)
+----------------------------
+Third party JS libraries used in Aurora (located at 
3rdparty/javascript/bower_components) are
+managed by bower, a JS dependency manager. Bower is only required if you plan 
to add, remove or
+update JS libraries. Bower can be installed using the following command:
+
+    npm install -g bower
+
+Bower depends on node.js and npm. The easiest way to install node on a mac is 
via brew:
+
+    brew install node
+
+For more node.js installation options refer to 
https://github.com/joyent/node/wiki/Installation.
+
+More info on installing and using bower can be found at: http://bower.io/. 
Once installed, you can
+use the following commands to view and modify the bower repo at
+3rdparty/javascript/bower_components
+
+    bower list
+    bower install <library name>
+    bower remove <library name>
+    bower update <library name>
+    bower help
+
+
+Faster Iteration in Vagrant
+---------------------------
+The scheduler serves UI assets from the classpath. For production deployments 
this means the assets
+are served from within a jar. However, for faster development iteration, the 
vagrant image is
+configured to add the `scheduler` subtree of `/vagrant/dist/resources/main` to 
the head of
+`CLASSPATH`. This path is configured as a shared filesystem to the path on the 
host system where
+your Aurora repository lives. This means that any updates under 
`dist/resources/main/scheduler` in
+your checkout will be reflected immediately in the UI served from within the 
vagrant image.
+
+The one caveat to this is that this path is under `dist` not `src`. This is 
because the assets must
+be processed by gradle before they can be served. So, unfortunately, you 
cannot just save your local
+changes and see them reflected in the UI, you must first run `./gradlew 
processResources`. This is
+less than ideal, but better than having to restart the scheduler after every 
change. Additionally,
+gradle makes this process somewhat easier with the use of the `--continuous` 
flag. If you run:
+`./gradlew processResources --continuous` gradle will monitor the filesystem 
for changes and run the
+task automatically as necessary. This doesn't quite provide hot-reload 
capabilities, but it does
+allow for <5s from save to changes being visibile in the UI with no further 
action required on the
+part of the developer.

Added: aurora/site/source/documentation/latest/features/constraints.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/constraints.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/constraints.md (added)
+++ aurora/site/source/documentation/latest/features/constraints.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,126 @@
+Scheduling Constraints
+======================
+
+By default, Aurora will pick any random slave with sufficient resources
+in order to schedule a task. This scheduling choice can be further
+restricted with the help of constraints.
+
+
+Mesos Attributes
+----------------
+
+Data centers are often organized with hierarchical failure domains.  Common 
failure domains
+include hosts, racks, rows, and PDUs.  If you have this information available, 
it is wise to tag
+the Mesos slave with them as
+[attributes](https://mesos.apache.org/documentation/attributes-resources/).
+
+The Mesos slave `--attributes` command line argument can be used to mark 
slaves with
+static key/value pairs, so called attributes (not to be confused with 
`--resources`, which are
+dynamic and accounted).
+
+For example, consider the host `cluster1-aaa-03-sr2` and its following 
attributes (given in
+key:value format): `host:cluster1-aaa-03-sr2` and `rack:aaa`.
+
+Aurora makes these attributes available for matching with scheduling 
constraints.
+
+
+Limit Constraints
+-----------------
+
+Limit constraints allow to control machine diversity using constraints. The 
below
+constraint ensures that no more than two instances of your job may run on a 
single host.
+Think of this as a "group by" limit.
+
+    Service(
+      name = 'webservice',
+      role = 'www-data',
+      constraints = {
+        'host': 'limit:2',
+      }
+      ...
+    )
+
+
+Likewise, you can use constraints to control rack diversity, e.g. at
+most one task per rack:
+
+    constraints = {
+      'rack': 'limit:1',
+    }
+
+Use these constraints sparingly as they can dramatically reduce Tasks' 
schedulability.
+Further details are available in the reference documentation on
+[Scheduling 
Constraints](../reference/configuration.md#specifying-scheduling-constraints).
+
+
+
+Value Constraints
+-----------------
+
+Value constraints can be used to express that a certain attribute with a 
certain value
+should be present on a Mesos slave. For example, the following job would only 
be
+scheduled on nodes that claim to have an `SSD` as their disk.
+
+    Service(
+      name = 'webservice',
+      role = 'www-data',
+      constraints = {
+        'disk': 'SSD',
+      }
+      ...
+    )
+
+
+Further details are available in the reference documentation on
+[Scheduling 
Constraints](../reference/configuration.md#specifying-scheduling-constraints).
+
+
+Running stateful services
+-------------------------
+
+Aurora is best suited to run stateless applications, but it also accommodates 
for stateful services
+like databases, or services that otherwise need to always run on the same 
machines.
+
+### Dedicated attribute
+
+Most of the Mesos attributes arbitrary and available for custom use.  There is 
one exception,
+though: the `dedicated` attribute.  Aurora treats this specially, and only 
allows matching jobs to
+run on these machines, and will only schedule matching jobs on these machines.
+
+
+#### Syntax
+The dedicated attribute has semantic meaning. The format is `$role(/.*)?`. 
When a job is created,
+the scheduler requires that the `$role` component matches the `role` field in 
the job
+configuration, and will reject the job creation otherwise.  The remainder of 
the attribute is
+free-form. We've developed the idiom of formatting this attribute as 
`$role/$job`, but do not
+enforce this. For example: a job `devcluster/www-data/prod/hello` with a 
dedicated constraint set as
+`www-data/web.multi` will have its tasks scheduled only on Mesos slaves 
configured with:
+`--attributes=dedicated:www-data/web.multi`.
+
+A wildcard (`*`) may be used for the role portion of the dedicated attribute, 
which will allow any
+owner to elect for a job to run on the host(s). For example: tasks from both
+`devcluster/www-data/prod/hello` and `devcluster/vagrant/test/hello` with a 
dedicated constraint
+formatted as `*/web.multi` will be scheduled only on Mesos slaves configured 
with
+`--attributes=dedicated:*/web.multi`. This may be useful when assembling a 
virtual cluster of
+machines sharing the same set of traits or requirements.
+
+##### Example
+Consider the following slave command line:
+
+    mesos-slave --attributes="dedicated:db_team/redis" ...
+
+And this job configuration:
+
+    Service(
+      name = 'redis',
+      role = 'db_team',
+      constraints = {
+        'dedicated': 'db_team/redis'
+      }
+      ...
+    )
+
+The job configuration is indicating that it should only be scheduled on slaves 
with the attribute
+`dedicated:db_team/redis`.  Additionally, Aurora will prevent any tasks that 
do _not_ have that
+constraint from running on those slaves.
+

Added: aurora/site/source/documentation/latest/features/containers.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/containers.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/containers.md (added)
+++ aurora/site/source/documentation/latest/features/containers.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,43 @@
+Containers
+==========
+
+
+Docker
+------
+
+Aurora has optional support for launching Docker containers, if correctly 
[configured by an Operator](../operations/configuration.md#docker-containers).
+
+Example (available in the [Vagrant 
environment](../getting-started/vagrant.md)):
+
+
+    $ cat /vagrant/examples/jobs/docker/hello_docker.aurora
+    hello_docker = Process(
+      name = 'hello',
+      cmdline = """
+        while true; do
+          echo hello world
+          sleep 10
+        done
+      """)
+
+    hello_world_docker = Task(
+      name = 'hello docker',
+      processes = [hello_world_proc],
+      resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB)
+    )
+
+    jobs = [
+      Service(
+        cluster = 'devcluster',
+        environment = 'devel',
+        role = 'docker-test',
+        name = 'hello_docker',
+        task = hello_world_docker,
+        container = Container(docker = Docker(image = 'python:2.7'))
+      )
+    ]
+
+
+In order to correctly execute processes inside a job, the docker container 
must have Python 2.7
+installed. Further details of how to use Docker can be found in the
+[Reference Documentation](../reference/configuration.md#docker-object).

Added: aurora/site/source/documentation/latest/features/cron-jobs.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/cron-jobs.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/cron-jobs.md (added)
+++ aurora/site/source/documentation/latest/features/cron-jobs.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,124 @@
+# Cron Jobs
+
+Aurora supports execution of scheduled jobs on a Mesos cluster using 
cron-style syntax.
+
+- [Overview](#overview)
+- [Collision Policies](#collision-policies)
+- [Failure recovery](#failure-recovery)
+- [Interacting with cron jobs via the Aurora 
CLI](#interacting-with-cron-jobs-via-the-aurora-cli)
+       - [cron schedule](#cron-schedule)
+       - [cron deschedule](#cron-deschedule)
+       - [cron start](#cron-start)
+       - [job killall, job restart, job 
kill](#job-killall-job-restart-job-kill)
+- [Technical Note About Syntax](#technical-note-about-syntax)
+- [Caveats](#caveats)
+       - [Failovers](#failovers)
+       - [Collision policy is best-effort](#collision-policy-is-best-effort)
+       - [Timezone Configuration](#timezone-configuration)
+
+## Overview
+
+A job is identified as a cron job by the presence of a
+`cron_schedule` attribute containing a cron-style schedule in the
+[`Job`](../reference/configuration.md#job-objects) object. Examples of cron 
schedules
+include "every 5 minutes" (`*/5 * * * *`), "Fridays at 17:00" (`* 17 * * 
FRI`), and
+"the 1st and 15th day of the month at 03:00" (`0 3 1,15 *`).
+
+Example (available in the [Vagrant 
environment](../getting-started/vagrant.md)):
+
+    $ cat /vagrant/examples/jobs/cron_hello_world.aurora
+    # A cron job that runs every 5 minutes.
+    jobs = [
+      Job(
+        cluster = 'devcluster',
+        role = 'www-data',
+        environment = 'test',
+        name = 'cron_hello_world',
+        cron_schedule = '*/5 * * * *',
+        task = SimpleTask(
+          'cron_hello_world',
+          'echo "Hello world from cron, the time is now $(date --rfc-822)"'),
+      ),
+    ]
+
+## Collision Policies
+
+The `cron_collision_policy` field specifies the scheduler's behavior when a 
new cron job is
+triggered while an older run hasn't finished. The scheduler has two policies 
available:
+
+* `KILL_EXISTING`: The default policy - on a collision the old instances are 
killed and a instances with the current
+configuration are started.
+* `CANCEL_NEW`: On a collision the new run is cancelled.
+
+Note that the use of `CANCEL_NEW` is likely a code smell - interrupted cron 
jobs should be able
+to recover their progress on a subsequent invocation, otherwise they risk 
having their work queue
+grow faster than they can process it.
+
+## Failure recovery
+
+Unlike with services, which aurora will always re-execute regardless of exit 
status, instances of
+cron jobs retry according to the `max_task_failures` attribute of the
+[Task](../reference/configuration.md#task-object) object. To get 
"run-until-success" semantics,
+set `max_task_failures` to `-1`.
+
+## Interacting with cron jobs via the Aurora CLI
+
+Most interaction with cron jobs takes place using the `cron` subcommand. See 
`aurora cron -h`
+for up-to-date usage instructions.
+
+### cron schedule
+Schedules a new cron job on the Aurora cluster for later runs or replaces the 
existing cron template
+with a new one. Only future runs will be affected, any existing active tasks 
are left intact.
+
+    $ aurora cron schedule devcluster/www-data/test/cron_hello_world 
/vagrant/examples/jobs/cron_hello_world.aurora
+
+### cron deschedule
+Deschedules a cron job, preventing future runs but allowing current runs to 
complete.
+
+    $ aurora cron deschedule devcluster/www-data/test/cron_hello_world
+
+### cron start
+Start a cron job immediately, outside of its normal cron schedule.
+
+    $ aurora cron start devcluster/www-data/test/cron_hello_world
+
+### job killall, job restart, job kill
+Cron jobs create instances running on the cluster that you can interact with 
like normal Aurora
+tasks with `job kill` and `job restart`.
+
+
+## Technical Note About Syntax
+
+`cron_schedule` uses a restricted subset of BSD crontab syntax. While the
+execution engine currently uses Quartz, the schedule parsing is custom, a 
subset of FreeBSD
+[crontab(5)](http://www.freebsd.org/cgi/man.cgi?crontab(5)) syntax. See
+[the 
source](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/cron/CrontabEntry.java#L106-L124)
+for details.
+
+
+## Caveats
+
+### Failovers
+No failover recovery. Aurora does not record the latest minute it fired
+triggers for across failovers. Therefore it's possible to miss triggers
+on failover. Note that this behavior may change in the future.
+
+It's necessary to sync time between schedulers with something like `ntpd`.
+Clock skew could cause double or missed triggers in the case of a failover.
+
+### Collision policy is best-effort
+Aurora aims to always have *at least one copy* of a given instance running at 
a time - it's
+an AP system, meaning it chooses Availability and Partition Tolerance at the 
expense of
+Consistency.
+
+If your collision policy was `CANCEL_NEW` and a task has terminated but
+Aurora has not noticed this Aurora will go ahead and create your new
+task.
+
+If your collision policy was `KILL_EXISTING` and a task was marked `LOST`
+but not yet GCed Aurora will go ahead and create your new task without
+attempting to kill the old one (outside the GC interval).
+
+### Timezone Configuration
+Cron timezone is configured indepdendently of JVM timezone with the 
`-cron_timezone` flag and
+defaults to UTC.

Added: aurora/site/source/documentation/latest/features/job-updates.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/job-updates.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/job-updates.md (added)
+++ aurora/site/source/documentation/latest/features/job-updates.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,111 @@
+Aurora Job Updates
+==================
+
+`Job` configurations can be updated at any point in their lifecycle.
+Usually updates are done incrementally using a process called a *rolling
+upgrade*, in which Tasks are upgraded in small groups, one group at a
+time.  Updates are done using various Aurora Client commands.
+
+
+Rolling Job Updates
+-------------------
+
+There are several sub-commands to manage job updates:
+
+    aurora update start <job key> <configuration file>
+    aurora update info <job key>
+    aurora update pause <job key>
+    aurora update resume <job key>
+    aurora update abort <job key>
+    aurora update list <cluster>
+
+When you `start` a job update, the command will return once it has sent the
+instructions to the scheduler.  At that point, you may view detailed
+progress for the update with the `info` subcommand, in addition to viewing
+graphical progress in the web browser.  You may also get a full listing of
+in-progress updates in a cluster with `list`.
+
+Once an update has been started, you can `pause` to keep the update but halt
+progress.  This can be useful for doing things like debug a  partially-updated
+job to determine whether you would like to proceed.  You can `resume` to
+proceed.
+
+You may `abort` a job update regardless of the state it is in. This will
+instruct the scheduler to completely abandon the job update and leave the job
+in the current (possibly partially-updated) state.
+
+For a configuration update, the Aurora Client calculates required changes
+by examining the current job config state and the new desired job config.
+It then starts a *rolling batched update process* by going through every batch
+and performing these operations:
+
+- If an instance is present in the scheduler but isn't in the new config,
+  then that instance is killed.
+- If an instance is not present in the scheduler but is present in
+  the new config, then the instance is created.
+- If an instance is present in both the scheduler and the new config, then
+  the client diffs both task configs. If it detects any changes, it
+  performs an instance update by killing the old config instance and adds
+  the new config instance.
+
+The Aurora client continues through the instance list until all tasks are
+updated, in `RUNNING,` and healthy for a configurable amount of time.
+If the client determines the update is not going well (a percentage of health
+checks have failed), it cancels the update.
+
+Update cancellation runs a procedure similar to the described above
+update sequence, but in reverse order. New instance configs are swapped
+with old instance configs and batch updates proceed backwards
+from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
+8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
+
+For details how to control a job update, please see the
+[UpdateConfig](../reference/configuration.md#updateconfig-objects) 
configuration object.
+
+
+Coordinated Job Updates
+------------------------
+
+Some Aurora services may benefit from having more control over updates by 
explicitly
+acknowledging ("heartbeating") job update progress. This may be helpful for 
mission-critical
+service updates where explicit job health monitoring is vital during the 
entire job update
+lifecycle. Such job updates would rely on an external service (or a custom 
client) periodically
+pulsing an active coordinated job update via a
+[pulseJobUpdate 
RPC](../../api/src/main/thrift/org/apache/aurora/gen/api.thrift).
+
+A coordinated update is defined by setting a positive
+[pulse_interval_secs](../reference/configuration.md#updateconfig-objects) 
value in job configuration
+file. If no pulses are received within specified interval the update will be 
blocked. A blocked
+update is unable to continue rolling forward (or rolling back) but retains its 
active status.
+It may only be unblocked by a fresh `pulseJobUpdate` call.
+
+NOTE: A coordinated update starts in `ROLL_FORWARD_AWAITING_PULSE` state and 
will not make any
+progress until the first pulse arrives. However, a paused update 
(`ROLL_FORWARD_PAUSED` or
+`ROLL_BACK_PAUSED`) is still considered active and upon resuming will 
immediately make progress
+provided the pulse interval has not expired.
+
+
+Canary Deployments
+------------------
+
+Canary deployments are a pattern for rolling out updates to a subset of job 
instances,
+in order to test different code versions alongside the actual production job.
+It is a risk-mitigation strategy for job owners and commonly used in a form 
where
+job instance 0 runs with a different configuration than the instances 1-N.
+
+For example, consider a job with 4 instances that each
+request 1 core of cpu, 1 GB of RAM, and 1 GB of disk space as specified
+in the configuration file `hello_world.aurora`. If you want to
+update it so it requests 2 GB of RAM instead of 1. You can create a new
+configuration file to do that called `new_hello_world.aurora` and
+issue
+
+    aurora update start <job_key_value>/0-1 new_hello_world.aurora
+
+This results in instances 0 and 1 having 1 cpu, 2 GB of RAM, and 1 GB of disk 
space,
+while instances 2 and 3 have 1 cpu, 1 GB of RAM, and 1 GB of disk space. If 
instance 3
+dies and restarts, it restarts with 1 cpu, 1 GB RAM, and 1 GB disk space.
+
+So that means there are two simultaneous task configurations for the same job
+at the same time, just valid for different ranges of instances. While this 
isn't a recommended
+pattern, it is valid and supported by the Aurora scheduler.

Added: aurora/site/source/documentation/latest/features/multitenancy.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/multitenancy.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/multitenancy.md (added)
+++ aurora/site/source/documentation/latest/features/multitenancy.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,62 @@
+Multitenancy
+============
+
+Aurora is a multi-tenant system that can run jobs of multiple clients/tenants.
+Going beyond the [resource isolation on an individual 
host](resource-isolation.md), it is
+crucial to prevent those jobs from stepping on each others toes.
+
+
+Job Namespaces
+--------------
+
+The namespace for jobs in Aurora follows a hierarchical structure. This is 
meant to make it easier
+to differentiate between different jobs. A job key consists of four parts. The 
four parts are
+`<cluster>/<role>/<environment>/<jobname>` in that order:
+
+* Cluster refers to the name of a particular Aurora installation.
+* Role names are user accounts.
+* Environment names are namespaces.
+* Jobname is the custom name of your job.
+
+Role names correspond to user accounts. They are used for
+[authentication](../operations/security.md), as the linux user used to run 
jobs, and for the
+assignment of [quota](#preemption). If you don't know what accounts are 
available, contact your
+sysadmin.
+
+The environment component in the job key, serves as a namespace. The values for
+environment are validated in the client and the scheduler so as to allow any 
of `devel`, `test`,
+`production`, and any value matching the regular expression `staging[0-9]*`.
+
+None of the values imply any difference in the scheduling behavior. 
Conventionally, the
+"environment" is set so as to indicate a certain level of stability in the 
behavior of the job
+by ensuring that an appropriate level of testing has been performed on the 
application code. e.g.
+in the case of a typical Job, releases may progress through the following 
phases in order of
+increasing level of stability: `devel`, `test`, `staging`, `production`.
+
+
+Preemption
+----------
+
+In order to guarantee that important production jobs are always running, 
Aurora supports
+preemption.
+
+Let's consider we have a pending job that is candidate for scheduling but 
resource shortage pressure
+prevents this. Active tasks can become the victim of preemption, if:
+
+ - both candidate and victim are owned by the same role and the
+   [priority](../reference/configuration.md#job-objects) of a victim is lower 
than the
+   [priority](../reference/configuration.md#job-objects) of the candidate.
+ - OR a victim is non-[production](../reference/configuration.md#job-objects) 
and the candidate is
+   [production](../reference/configuration.md#job-objects).
+
+In other words, tasks from 
[production](../reference/configuration.md#job-objects) jobs may preempt
+tasks from any non-production job. However, a production task may only be 
preempted by tasks from
+production jobs in the same role with higher 
[priority](../reference/configuration.md#job-objects).
+
+Aurora requires resource quotas for [production non-dedicated 
jobs](../reference/configuration.md#job-objects).
+Quota is enforced at the job role level and when set, defines a 
non-preemptible pool of compute resources within
+that role. All job types (service, adhoc or cron) require role resource quota 
unless a job has
+[dedicated constraint set](constraints.md#dedicated-attribute).
+
+To grant quota to a particular role in production, an operator can use the 
command
+`aurora_admin set_quota`.

Added: aurora/site/source/documentation/latest/features/resource-isolation.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/resource-isolation.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/resource-isolation.md 
(added)
+++ aurora/site/source/documentation/latest/features/resource-isolation.md Sat 
Apr 16 04:23:06 2016
@@ -0,0 +1,167 @@
+Resources Isolation and Sizing
+==============================
+
+- [Isolation](#isolation)
+- [Sizing](#sizing)
+- [Oversubscription](#oversubscription)
+
+
+Isolation
+---------
+
+Aurora is a multi-tenant system; a single software instance runs on a
+server, serving multiple clients/tenants. To share resources among
+tenants, it implements isolation of:
+
+* CPU
+* memory
+* disk space
+
+CPU is a soft limit, and handled differently from memory and disk space.
+Too low a CPU value results in throttling your application and
+slowing it down. Memory and disk space are both hard limits; when your
+application goes over these values, it's killed.
+
+### CPU Isolation
+
+Mesos uses a quota based CPU scheduler (the *Completely Fair Scheduler*)
+to provide consistent and predictable performance.  This is effectively
+a guarantee of resources -- you receive at least what you requested, but
+also no more than you've requested.
+
+The scheduler gives applications a CPU quota for every 100 ms interval.
+When an application uses its quota for an interval, it is throttled for
+the rest of the 100 ms. Usage resets for each interval and unused
+quota does not carry over.
+
+For example, an application specifying 4.0 CPU has access to 400 ms of
+CPU time every 100 ms. This CPU quota can be used in different ways,
+depending on the application and available resources. Consider the
+scenarios shown in this diagram.
+
+![CPU Availability](../images/CPUavailability.png)
+
+* *Scenario A*: the application can use up to 4 cores continuously for
+every 100 ms interval. It is never throttled and starts processing
+new requests immediately.
+
+* *Scenario B* : the application uses up to 8 cores (depending on
+availability) but is throttled after 50 ms. The CPU quota resets at the
+start of each new 100 ms interval.
+
+* *Scenario C* : is like Scenario A, but there is a garbage collection
+event in the second interval that consumes all CPU quota. The
+application throttles for the remaining 75 ms of that interval and
+cannot service requests until the next interval. In this example, the
+garbage collection finished in one interval but, depending on how much
+garbage needs collecting, it may take more than one interval and further
+delay service of requests.
+
+*Technical Note*: Mesos considers logical cores, also known as
+hyperthreading or SMT cores, as the unit of CPU.
+
+### Memory Isolation
+
+Mesos uses dedicated memory allocation. Your application always has
+access to the amount of memory specified in your configuration. The
+application's memory use is defined as the sum of the resident set size
+(RSS) of all processes in a shard. Each shard is considered
+independently.
+
+In other words, say you specified a memory size of 10GB. Each shard
+would receive 10GB of memory. If an individual shard's memory demands
+exceed 10GB, that shard is killed, but the other shards continue
+working.
+
+*Technical note*: Total memory size is not enforced at allocation time,
+so your application can request more than its allocation without getting
+an ENOMEM. However, it will be killed shortly after.
+
+### Disk Space
+
+Disk space used by your application is defined as the sum of the files'
+disk space in your application's directory, including the `stdout` and
+`stderr` logged from your application. Each shard is considered
+independently. You should use off-node storage for your application's
+data whenever possible.
+
+In other words, say you specified disk space size of 100MB. Each shard
+would receive 100MB of disk space. If an individual shard's disk space
+demands exceed 100MB, that shard is killed, but the other shards
+continue working.
+
+After your application finishes running, its allocated disk space is
+reclaimed. Thus, your job's final action should move any disk content
+that you want to keep, such as logs, to your home file system or other
+less transitory storage. Disk reclamation takes place an undefined
+period after the application finish time; until then, the disk contents
+are still available but you shouldn't count on them being so.
+
+*Technical note* : Disk space is not enforced at write so your
+application can write above its quota without getting an ENOSPC, but it
+will be killed shortly after. This is subject to change.
+
+### Other Resources
+
+Other resources, such as network bandwidth, do not have any performance
+guarantees. For some resources, such as memory bandwidth, there are no
+practical sharing methods so some application combinations collocated on
+the same host may cause contention.
+
+
+Sizing
+-------
+
+### CPU Sizing
+
+To correctly size Aurora-run Mesos tasks, specify a per-shard CPU value
+that lets the task run at its desired performance when at peak load
+distributed across all shards. Include reserve capacity of at least 50%,
+possibly more, depending on how critical your service is (or how
+confident you are about your original estimate : -)), ideally by
+increasing the number of shards to also improve resiliency. When running
+your application, observe its CPU stats over time. If consistently at or
+near your quota during peak load, you should consider increasing either
+per-shard CPU or the number of shards.
+
+## Memory Sizing
+
+Size for your application's peak requirement. Observe the per-instance
+memory statistics over time, as memory requirements can vary over
+different periods. Remember that if your application exceeds its memory
+value, it will be killed, so you should also add a safety margin of
+around 10-20%. If you have the ability to do so, you may also want to
+put alerts on the per-instance memory.
+
+## Disk Space Sizing
+
+Size for your application's peak requirement. Rotate and discard log
+files as needed to stay within your quota. When running a Java process,
+add the maximum size of the Java heap to your disk space requirement, in
+order to account for an out of memory error dumping the heap
+into the application's sandbox space.
+
+
+Oversubscription
+----------------
+
+**WARNING**: This feature is currently in alpha status. Do not use it in 
production clusters!
+
+Mesos [supports a concept of revocable 
tasks](http://mesos.apache.org/documentation/latest/oversubscription/)
+by oversubscribing machine resources by the amount deemed safe to not affect 
the existing
+non-revocable tasks. Aurora now supports revocable jobs via a `tier` setting 
set to `revocable`
+value.
+
+The Aurora scheduler must be configured to receive revocable offers from Mesos 
and accept revocable
+jobs. If not configured properly revocable tasks will never get assigned to 
hosts and will stay in
+`PENDING`. Set these scheduler flag to allow receiving revocable Mesos offers:
+
+    -receive_revocable_resources=true
+
+Specify a tier configuration file path (unless you want to use the 
[default](../../src/main/resources/org/apache/aurora/scheduler/tiers.json)):
+
+    -tier_config=path/to/tiers/config.json
+
+
+See the [Configuration Reference](../references/configuration.md) for details 
on how to mark a job
+as being revocable.

Added: aurora/site/source/documentation/latest/features/service-discovery.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/service-discovery.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/service-discovery.md 
(added)
+++ aurora/site/source/documentation/latest/features/service-discovery.md Sat 
Apr 16 04:23:06 2016
@@ -0,0 +1,44 @@
+Service Discovery
+=================
+
+It is possible for the Aurora executor to announce tasks into ServerSets for
+the purpose of service discovery.  ServerSets use the Zookeeper [group 
membership 
pattern](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_outOfTheBox)
+of which there are several reference implementations:
+
+  - [C++](https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp)
+  - 
[Java](https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/ServerSetImpl.java#L221)
+  - 
[Python](https://github.com/twitter/commons/blob/master/src/python/twitter/common/zookeeper/serverset/serverset.py#L51)
+
+These can also be used natively in Finagle using the 
[ZookeeperServerSetCluster](https://github.com/twitter/finagle/blob/master/finagle-serversets/src/main/scala/com/twitter/finagle/zookeeper/ZookeeperServerSetCluster.scala).
+
+For more information about how to configure announcing, see the [Configuration 
Reference](../reference/configuration.md).
+
+Using Mesos DiscoveryInfo
+-------------------------
+Experimental support for populating DiscoveryInfo in Mesos is introduced in 
Aurora. This can be used to build
+custom service discovery system not using zookeeper. Please see `Service 
Discovery` section in
+[Mesos Framework Development 
guide](http://mesos.apache.org/documentation/latest/app-framework-development-guide/)
 for
+explanation of the protobuf message in Mesos.
+
+To use this feature, please enable `--populate_discovery_info` flag on 
scheduler. All jobs started by scheduler
+afterwards will have their portmap populated to Mesos and discoverable in 
`/state` endpoint in Mesos master and agent.
+
+### Using Mesos DNS
+An example is using [Mesos-DNS](https://github.com/mesosphere/mesos-dns), 
which is able to generate multiple DNS
+records. With current implementation, the example job with key 
`devcluster/vagrant/test/http-example` generates at
+least the following:
+
+1. An A record for `http_example.test.vagrant.twitterscheduler.mesos` (which 
only includes IP address);
+2. A [SRV record](https://en.wikipedia.org/wiki/SRV_record) for
+ `_http_example.test.vagrant._tcp.twitterscheduler.mesos`, which includes IP 
address and every port. This should only
+  be used if the service has one port.
+3. A SRV record 
`_{port-name}._http_example.test.vagrant._tcp.twitterscheduler.mesos` for each 
port name
+  defined. This should be used when the service has multiple ports.
+
+Things to note:
+
+1. The domain part (".mesos" in above example) can be configured in [Mesos 
DNS](http://mesosphere.github.io/mesos-dns/docs/configuration-parameters.html);
+2. The `twitterscheduler` part is the lower-case of framework name, which is 
not configurable right now (see
+   
[TWITTER_SCHEDULER_NAME](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/mesos/CommandLineDriverSettingsModule.java#L98));
+3. Right now, portmap and port aliases in announcer object are not reflected 
in DiscoveryInfo, therefore not visible in
+   Mesos DNS records either. This is because they are only resolved in thermos 
executors.
\ No newline at end of file

Added: aurora/site/source/documentation/latest/features/services.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/services.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/services.md (added)
+++ aurora/site/source/documentation/latest/features/services.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,99 @@
+Long-running Services
+=====================
+
+Jobs that are always restart on completion, whether successful or unsuccessful,
+are called services. This is useful for long-running processes
+such as webservices that should always be running, unless stopped explicitly.
+
+
+Service Specification
+---------------------
+
+A job is identified as a service by the presence of the flag
+``service=True` in the [`Job`](../reference/configuration.md#job-objects) 
object.
+The `Service` alias can be used as shorthand for `Job` with `service=True`.
+
+Example (available in the [Vagrant 
environment](../getting-started/vagrant.md)):
+
+    $ cat /vagrant/examples/jobs/hello_world.aurora
+    hello = Process(
+      name = 'hello',
+      cmdline = """
+        while true; do
+          echo hello world
+          sleep 10
+        done
+      """)
+
+    task = SequentialTask(
+      processes = [hello],
+      resources = Resources(cpu = 1.0, ram = 128*MB, disk = 128*MB)
+    )
+
+    jobs = [
+      Service(
+        task = task,
+        cluster = 'devcluster',
+        role = 'www-data',
+        environment = 'prod',
+        name = 'hello'
+      )
+    ]
+
+
+Jobs without the service bit set only restart up to `max_task_failures` times 
and only if they
+terminated unsuccessfully either due to human error or machine failure (see the
+[`Job`](../reference/configuration.md#job-objects) object for details).
+
+
+Ports
+-----
+
+In order to be useful, most services have to bind to one or more ports. Aurora 
enables this
+usecase via the [`thermos.ports` 
namespace](../reference/configuration.md#thermos-namespace) that
+allows to request arbitrarily named ports:
+
+
+    nginx = Process(
+      name = 'nginx',
+      cmdline = './run_nginx.sh -port {{thermos.ports[http]}}'
+    )
+
+
+When this process is included in a job, the job will be allocated a port, and 
the command line
+will be replaced with something like:
+
+    ./run_nginx.sh -port 42816
+
+Where 42816 happens to be the allocated port.
+
+For details on how to enable clients to discover this dynamically assigned 
port, see our
+[Service Discovery](service-discovery.md) documentation.
+
+
+Health Checking
+---------------
+
+Typically, the Thermos executor monitors processes within a task only by 
liveness of the forked
+process. In addition to that, Aurora has support for rudimentary health 
checking: Either via HTTP
+via custom shell scripts.
+
+For example, simply by requesting a `health` port, a process can request to be 
health checked
+via repeated calls to the `/health` endpoint:
+
+    nginx = Process(
+      name = 'nginx',
+      cmdline = './run_nginx.sh -port {{thermos.ports[health]}}'
+    )
+
+Please see the
+[configuration 
reference](../reference/configuration.md#user-content-healthcheckconfig-objects)
+for configuration options for this feature.
+
+You can pause health checking by touching a file inside of your sandbox, named 
`.healthchecksnooze`.
+As long as that file is present, health checks will be disabled, enabling 
users to gather core
+dumps or other performance measurements without worrying about Aurora's health 
check killing
+their process.
+
+WARNING: Remember to remove this when you are done, otherwise your instance 
will have permanently
+disabled health checks.

Added: aurora/site/source/documentation/latest/features/sla-metrics.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/features/sla-metrics.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/features/sla-metrics.md (added)
+++ aurora/site/source/documentation/latest/features/sla-metrics.md Sat Apr 16 
04:23:06 2016
@@ -0,0 +1,178 @@
+Aurora SLA Measurement
+======================
+
+- [Overview](#overview)
+- [Metric Details](#metric-details)
+  - [Platform Uptime](#platform-uptime)
+  - [Job Uptime](#job-uptime)
+  - [Median Time To Assigned (MTTA)](#median-time-to-assigned-\(mtta\))
+  - [Median Time To Running (MTTR)](#median-time-to-running-\(mttr\))
+- [Limitations](#limitations)
+
+## Overview
+
+The primary goal of the feature is collection and monitoring of Aurora job SLA 
(Service Level
+Agreements) metrics that defining a contractual relationship between the 
Aurora/Mesos platform
+and hosted services.
+
+The Aurora SLA feature is by default only enabled for service (non-cron)
+production jobs (`"production=True"` in your `.aurora` config). It can be 
enabled for
+non-production services by an operator via the scheduler command line flag 
`-sla_non_prod_metrics`.
+
+Counters that track SLA measurements are computed periodically within the 
scheduler.
+The individual instance metrics are refreshed every minute (configurable via
+`sla_stat_refresh_interval`). The instance counters are subsequently 
aggregated by
+relevant grouping types before exporting to scheduler `/vars` endpoint (when 
using `vagrant`
+that would be `http://192.168.33.7:8081/vars`)
+
+
+## Metric Details
+
+### Platform Uptime
+
+*Aggregate amount of time a job spends in a non-runnable state due to platform 
unavailability
+or scheduling delays. This metric tracks Aurora/Mesos uptime performance and 
reflects on any
+system-caused downtime events (tasks LOST or DRAINED). Any user-initiated task 
kills/restarts
+will not degrade this metric.*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_platform_uptime_percent`
+* Per cluster - `sla_cluster_platform_uptime_percent`
+
+**Units:** percent
+
+A fault in the task environment may cause the Aurora/Mesos to have different 
views on the task state
+or lose track of the task existence. In such cases, the service task is marked 
as LOST and
+rescheduled by Aurora. For example, this may happen when the task stays in 
ASSIGNED or STARTING
+for too long or the Mesos slave becomes unhealthy (or disappears completely). 
The time between
+task entering LOST and its replacement reaching RUNNING state is counted 
towards platform downtime.
+
+Another example of a platform downtime event is the administrator-requested 
task rescheduling. This
+happens during planned Mesos slave maintenance when all slave tasks are marked 
as DRAINED and
+rescheduled elsewhere.
+
+To accurately calculate Platform Uptime, we must separate platform incurred 
downtime from user
+actions that put a service instance in a non-operational state. It is simpler 
to isolate
+user-incurred downtime and treat all other downtime as platform incurred.
+
+Currently, a user can cause a healthy service (task) downtime in only two 
ways: via `killTasks`
+or `restartShards` RPCs. For both, their affected tasks leave an audit state 
transition trail
+relevant to uptime calculations. By applying a special "SLA meaning" to 
exposed task state
+transition records, we can build a deterministic downtime trace for every 
given service instance.
+
+A task going through a state transition carries one of three possible SLA 
meanings
+(see 
[SlaAlgorithm.java](../../src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java)
 for
+sla-to-task-state mapping):
+
+* Task is UP: starts a period where the task is considered to be up and 
running from the Aurora
+  platform standpoint.
+
+* Task is DOWN: starts a period where the task cannot reach the UP state for 
some
+  non-user-related reason. Counts towards instance downtime.
+
+* Task is REMOVED from SLA: starts a period where the task is not expected to 
be UP due to
+  user initiated action or failure. We ignore this period for the uptime 
calculation purposes.
+
+This metric is recalculated over the last sampling period (last minute) to 
account for
+any UP/DOWN/REMOVED events. It ignores any UP/DOWN events not immediately 
adjacent to the
+sampling interval as well as adjacent REMOVED events.
+
+### Job Uptime
+
+*Percentage of the job instances considered to be in RUNNING state for the 
specified duration
+relative to request time. This is a purely application side metric that is 
considering aggregate
+uptime of all RUNNING instances. Any user- or platform initiated restarts 
directly affect
+this metric.*
+
+**Collection scope:** We currently expose job uptime values at 5 pre-defined
+percentiles (50th,75th,90th,95th and 99th):
+
+* `sla_<job_key>_job_uptime_50_00_sec`
+* `sla_<job_key>_job_uptime_75_00_sec`
+* `sla_<job_key>_job_uptime_90_00_sec`
+* `sla_<job_key>_job_uptime_95_00_sec`
+* `sla_<job_key>_job_uptime_99_00_sec`
+
+**Units:** seconds
+You can also get customized real-time stats from aurora client. See `aurora 
sla -h` for
+more details.
+
+### Median Time To Assigned (MTTA)
+
+*Median time a job spends waiting for its tasks to be assigned to a host. This 
is a combined
+metric that helps track the dependency of scheduling performance on the 
requested resources
+(user scope) as well as the internal scheduler bin-packing algorithm 
efficiency (platform scope).*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_mtta_ms`
+* Per cluster - `sla_cluster_mtta_ms`
+* Per instance size (small, medium, large, x-large, xx-large). Size are 
defined in:
+[ResourceAggregates.java](../../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
+  * By CPU:
+    * `sla_cpu_small_mtta_ms`
+    * `sla_cpu_medium_mtta_ms`
+    * `sla_cpu_large_mtta_ms`
+    * `sla_cpu_xlarge_mtta_ms`
+    * `sla_cpu_xxlarge_mtta_ms`
+  * By RAM:
+    * `sla_ram_small_mtta_ms`
+    * `sla_ram_medium_mtta_ms`
+    * `sla_ram_large_mtta_ms`
+    * `sla_ram_xlarge_mtta_ms`
+    * `sla_ram_xxlarge_mtta_ms`
+  * By DISK:
+    * `sla_disk_small_mtta_ms`
+    * `sla_disk_medium_mtta_ms`
+    * `sla_disk_large_mtta_ms`
+    * `sla_disk_xlarge_mtta_ms`
+    * `sla_disk_xxlarge_mtta_ms`
+
+**Units:** milliseconds
+
+MTTA only considers instances that have already reached ASSIGNED state and 
ignores those
+that are still PENDING. This ensures straggler instances (e.g. with 
unreasonable resource
+constraints) do not affect metric curves.
+
+### Median Time To Running (MTTR)
+
+*Median time a job waits for its tasks to reach RUNNING state. This is a 
comprehensive metric
+reflecting on the overall time it takes for the Aurora/Mesos to start 
executing user content.*
+
+**Collection scope:**
+
+* Per job - `sla_<job_key>_mttr_ms`
+* Per cluster - `sla_cluster_mttr_ms`
+* Per instance size (small, medium, large, x-large, xx-large). Size are 
defined in:
+[ResourceAggregates.java](../../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
+  * By CPU:
+    * `sla_cpu_small_mttr_ms`
+    * `sla_cpu_medium_mttr_ms`
+    * `sla_cpu_large_mttr_ms`
+    * `sla_cpu_xlarge_mttr_ms`
+    * `sla_cpu_xxlarge_mttr_ms`
+  * By RAM:
+    * `sla_ram_small_mttr_ms`
+    * `sla_ram_medium_mttr_ms`
+    * `sla_ram_large_mttr_ms`
+    * `sla_ram_xlarge_mttr_ms`
+    * `sla_ram_xxlarge_mttr_ms`
+  * By DISK:
+    * `sla_disk_small_mttr_ms`
+    * `sla_disk_medium_mttr_ms`
+    * `sla_disk_large_mttr_ms`
+    * `sla_disk_xlarge_mttr_ms`
+    * `sla_disk_xxlarge_mttr_ms`
+
+**Units:** milliseconds
+
+MTTR only considers instances in RUNNING state. This ensures straggler 
instances (e.g. with
+unreasonable resource constraints) do not affect metric curves.
+
+## Limitations
+
+* The availability of Aurora SLA metrics is bound by the scheduler 
availability.
+
+* All metrics are calculated at a pre-defined interval (currently set at 1 
minute).
+  Scheduler restarts may result in missed collections.

Added: aurora/site/source/documentation/latest/getting-started/overview.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/overview.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/getting-started/overview.md (added)
+++ aurora/site/source/documentation/latest/getting-started/overview.md Sat Apr 
16 04:23:06 2016
@@ -0,0 +1,110 @@
+Aurora System Overview
+======================
+
+Apache Aurora is a service scheduler that runs on top of Apache Mesos, 
enabling you to run
+long-running services, cron jobs, and ad-hoc jobs that take advantage of 
Apache Mesos' scalability,
+fault-tolerance, and resource isolation.
+
+
+Components
+----------
+
+It is important to have an understanding of the components that make up
+a functioning Aurora cluster.
+
+![Aurora Components](../images/components.png)
+
+* **Aurora scheduler**
+  The scheduler is your primary interface to the work you run in your cluster. 
 You will
+  instruct it to run jobs, and it will manage them in Mesos for you.  You will 
also frequently use
+  the scheduler's read-only web interface as a heads-up display for what's 
running in your cluster.
+
+* **Aurora client**
+  The client (`aurora` command) is a command line tool that exposes primitives 
that you can use to
+  interact with the scheduler. The client operates on
+
+  Aurora also provides an admin client (`aurora_admin` command) that contains 
commands built for
+  cluster administrators.  You can use this tool to do things like manage user 
quotas and manage
+  graceful maintenance on machines in cluster.
+
+* **Aurora executor**
+  The executor (a.k.a. Thermos executor) is responsible for carrying out the 
workloads described in
+  the Aurora DSL (`.aurora` files).  The executor is what actually executes 
user processes.  It will
+  also perform health checking of tasks and register tasks in ZooKeeper for 
the purposes of dynamic
+  service discovery.
+
+* **Aurora observer**
+  The observer provides browser-based access to the status of individual tasks 
executing on worker
+  machines.  It gives insight into the processes executing, and facilitates 
browsing of task sandbox
+  directories.
+
+* **ZooKeeper**
+  [ZooKeeper](http://zookeeper.apache.org) is a distributed consensus system.  
In an Aurora cluster
+  it is used for reliable election of the leading Aurora scheduler and Mesos 
master.  It is also
+  used as a vehicle for service discovery, see [Service 
Discovery](../features/service-discovery.md)
+
+* **Mesos master**
+  The master is responsible for tracking worker machines and performing 
accounting of their
+  resources.  The scheduler interfaces with the master to control the cluster.
+
+* **Mesos agent**
+  The agent receives work assigned by the scheduler and executes them.  It 
interfaces with Linux
+  isolation systems like cgroups, namespaces and Docker to manage the resource 
consumption of tasks.
+  When a user task is launched, the agent will launch the executor (in the 
context of a Linux cgroup
+  or Docker container depending upon the environment), which will in turn fork 
user processes.
+
+
+Jobs, Tasks and Processes
+--------------------------
+
+Aurora is a Mesos framework used to schedule *jobs* onto Mesos. Mesos
+cares about individual *tasks*, but typical jobs consist of dozens or
+hundreds of task replicas. Aurora provides a layer on top of Mesos with
+its `Job` abstraction. An Aurora `Job` consists of a task template and
+instructions for creating near-identical replicas of that task (modulo
+things like "instance id" or specific port numbers which may differ from
+machine to machine).
+
+How many tasks make up a Job is complicated. On a basic level, a Job consists 
of
+one task template and instructions for creating near-identical replicas of 
that task
+(otherwise referred to as "instances" or "shards").
+
+A task can merely be a single *process* corresponding to a single
+command line, such as `python2.7 my_script.py`. However, a task can also
+consist of many separate processes, which all run within a single
+sandbox. For example, running multiple cooperating agents together,
+such as `logrotate`, `installer`, master, or slave processes. This is
+where Thermos comes in. While Aurora provides a `Job` abstraction on
+top of Mesos `Tasks`, Thermos provides a `Process` abstraction
+underneath Mesos `Task`s and serves as part of the Aurora framework's
+executor.
+
+You define `Job`s,` Task`s, and `Process`es in a configuration file.
+Configuration files are written in Python, and make use of the
+[Pystachio](https://github.com/wickman/pystachio) templating language,
+along with specific Aurora, Mesos, and Thermos commands and methods.
+The configuration files typically end with a `.aurora` extension.
+
+Summary:
+
+* Aurora manages jobs made of tasks.
+* Mesos manages tasks made of processes.
+* Thermos manages processes.
+* All that is defined in `.aurora` configuration files
+
+![Aurora hierarchy](../images/aurora_hierarchy.png)
+
+Each `Task` has a *sandbox* created when the `Task` starts and garbage
+collected when it finishes. All of a `Task'`s processes run in its
+sandbox, so processes can share state by using a shared current working
+directory.
+
+The sandbox garbage collection policy considers many factors, most
+importantly age and size. It makes a best-effort attempt to keep
+sandboxes around as long as possible post-task in order for service
+owners to inspect data and logs, should the `Task` have completed
+abnormally. But you can't design your applications assuming sandboxes
+will be around forever, e.g. by building log saving or other
+checkpointing mechanisms directly into your application or into your
+`Job` description.
+

Added: aurora/site/source/documentation/latest/getting-started/tutorial.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/tutorial.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/getting-started/tutorial.md (added)
+++ aurora/site/source/documentation/latest/getting-started/tutorial.md Sat Apr 
16 04:23:06 2016
@@ -0,0 +1,258 @@
+# Aurora Tutorial
+
+This tutorial shows how to use the Aurora scheduler to run (and 
"`printf-debug`")
+a hello world program on Mesos. This is the recommended document for new 
Aurora users
+to start getting up to speed on the system.
+
+- [Prerequisite](#setup-install-aurora)
+- [The Script](#the-script)
+- [Aurora Configuration](#aurora-configuration)
+- [Creating the Job](#creating-the-job)
+- [Watching the Job Run](#watching-the-job-run)
+- [Cleanup](#cleanup)
+- [Next Steps](#next-steps)
+
+
+## Prerequisite
+
+This tutorial assumes you are running [Aurora locally using 
Vagrant](vagrant.md).
+However, in general the instructions are also applicable to any other
+[Aurora installation](../operations/installation.md).
+
+Unless otherwise stated, all commands are to be run from the root of the aurora
+repository clone.
+
+
+## The Script
+
+Our "hello world" application is a simple Python script that loops
+forever, displaying the time every few seconds. Copy the code below and
+put it in a file named `hello_world.py` in the root of your Aurora repository 
clone
+(Note: this directory is the same as `/vagrant` inside the Vagrant VMs).
+
+The script has an intentional bug, which we will explain later on.
+
+<!-- NOTE: If you are changing this file, be sure to also update 
examples/vagrant/test_tutorial.sh.
+-->
+```python
+import time
+
+def main():
+  SLEEP_DELAY = 10
+  # Python experts - ignore this blatant bug.
+  for i in xrang(100):
+    print("Hello world! The time is now: %s. Sleeping for %d secs" % (
+      time.asctime(), SLEEP_DELAY))
+    time.sleep(SLEEP_DELAY)
+
+if __name__ == "__main__":
+  main()
+```
+
+## Aurora Configuration
+
+Once we have our script/program, we need to create a *configuration
+file* that tells Aurora how to manage and launch our Job. Save the below
+code in the file `hello_world.aurora`.
+
+<!-- NOTE: If you are changing this file, be sure to also update 
examples/vagrant/test_tutorial.sh.
+-->
+```python
+pkg_path = '/vagrant/hello_world.py'
+
+# we use a trick here to make the configuration change with
+# the contents of the file, for simplicity.  in a normal setting, packages 
would be
+# versioned, and the version number would be changed in the configuration.
+import hashlib
+with open(pkg_path, 'rb') as f:
+  pkg_checksum = hashlib.md5(f.read()).hexdigest()
+
+# copy hello_world.py into the local sandbox
+install = Process(
+  name = 'fetch_package',
+  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, 
pkg_checksum))
+
+# run the script
+hello_world = Process(
+  name = 'hello_world',
+  cmdline = 'python -u hello_world.py')
+
+# describe the task
+hello_world_task = SequentialTask(
+  processes = [install, hello_world],
+  resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB))
+
+jobs = [
+  Service(cluster = 'devcluster',
+          environment = 'devel',
+          role = 'www-data',
+          name = 'hello_world',
+          task = hello_world_task)
+]
+```
+
+There is a lot going on in that configuration file:
+
+1. From a "big picture" viewpoint, it first defines two
+Processes. Then it defines a Task that runs the two Processes in the
+order specified in the Task definition, as well as specifying what
+computational and memory resources are available for them.  Finally,
+it defines a Job that will schedule the Task on available and suitable
+machines. This Job is the sole member of a list of Jobs; you can
+specify more than one Job in a config file.
+
+2. At the Process level, it specifies how to get your code into the
+local sandbox in which it will run. It then specifies how the code is
+actually run once the second Process starts.
+
+For more about Aurora configuration files, see the [Configuration
+Tutorial](../reference/configuration-tutorial.md) and the [Configuration
+Reference](../reference/configuration.md) (preferably after finishing this
+tutorial).
+
+
+## Creating the Job
+
+We're ready to launch our job! To do so, we use the Aurora Client to
+issue a Job creation request to the Aurora scheduler.
+
+Many Aurora Client commands take a *job key* argument, which uniquely
+identifies a Job. A job key consists of four parts, each separated by a
+"/". The four parts are  `<cluster>/<role>/<environment>/<jobname>`
+in that order:
+
+* Cluster refers to the name of a particular Aurora installation.
+* Role names are user accounts existing on the slave machines. If you
+don't know what accounts are available, contact your sysadmin.
+* Environment names are namespaces; you can count on `test`, `devel`,
+`staging` and `prod` existing.
+* Jobname is the custom name of your job.
+
+When comparing two job keys, if any of the four parts is different from
+its counterpart in the other key, then the two job keys identify two separate
+jobs. If all four values are identical, the job keys identify the same job.
+
+The `clusters.json` [client 
configuration](../reference/client-cluster-configuration.md)
+for the Aurora scheduler defines the available cluster names.
+For Vagrant, from the top-level of your Aurora repository clone, do:
+
+    $ vagrant ssh
+
+Followed by:
+
+    vagrant@aurora:~$ cat /etc/aurora/clusters.json
+
+You'll see something like the following. The `name` value shown here, 
corresponds to a job key's cluster value.
+
+```javascript
+[{
+  "name": "devcluster",
+  "zk": "192.168.33.7",
+  "scheduler_zk_path": "/aurora/scheduler",
+  "auth_mechanism": "UNAUTHENTICATED",
+  "slave_run_directory": "latest",
+  "slave_root": "/var/lib/mesos"
+}]
+```
+
+The Aurora Client command that actually runs our Job is `aurora job create`. 
It creates a Job as
+specified by its job key and configuration file arguments and runs it.
+
+    aurora job create <cluster>/<role>/<environment>/<jobname> <config_file>
+
+Or for our example:
+
+    aurora job create devcluster/www-data/devel/hello_world 
/vagrant/hello_world.aurora
+
+After entering our virtual machine using `vagrant ssh`, this returns:
+
+    vagrant@aurora:~$ aurora job create devcluster/www-data/devel/hello_world 
/vagrant/hello_world.aurora
+     INFO] Creating job hello_world
+     INFO] Checking status of devcluster/www-data/devel/hello_world
+    Job create succeeded: job 
url=http://aurora.local:8081/scheduler/www-data/devel/hello_world
+
+
+## Watching the Job Run
+
+Now that our job is running, let's see what it's doing. Access the
+scheduler web interface at 
`http://$scheduler_hostname:$scheduler_port/scheduler`
+Or when using `vagrant`, `http://192.168.33.7:8081/scheduler`
+First we see what Jobs are scheduled:
+
+![Scheduled Jobs](../images/ScheduledJobs.png)
+
+Click on your user name, which in this case was `www-data`, and we see the 
Jobs associated
+with that role:
+
+![Role Jobs](../images/RoleJobs.png)
+
+If you click on your `hello_world` Job, you'll see:
+
+![hello_world Job](../images/HelloWorldJob.png)
+
+Oops, looks like our first job didn't quite work! The task is temporarily 
throttled for
+having failed on every attempt of the Aurora scheduler to run it. We have to 
figure out
+what is going wrong.
+
+On the Completed tasks tab, we see all past attempts of the Aurora scheduler 
to run our job.
+
+![Completed tasks tab](../images/CompletedTasks.png)
+
+We can navigate to the Task page of a failed run by clicking on the host link.
+
+![Task page](../images/TaskBreakdown.png)
+
+Once there, we see that the `hello_world` process failed. The Task page
+captures the standard error and standard output streams and makes them 
available.
+Clicking through to `stderr` on the failed `hello_world` process, we see what 
happened.
+
+![stderr page](../images/stderr.png)
+
+It looks like we made a typo in our Python script. We wanted `xrange`,
+not `xrang`. Edit the `hello_world.py` script to use the correct function
+and save it as `hello_world_v2.py`. Then update the `hello_world.aurora`
+configuration to the newest version.
+
+In order to try again, we can now instruct the scheduler to update our job:
+
+    vagrant@aurora:~$ aurora update start 
devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
+     INFO] Starting update for: hello_world
+    Job update has started. View your update progress at 
http://aurora.local:8081/scheduler/www-data/devel/hello_world/update/8ef38017-e60f-400d-a2f2-b5a8b724e95b
+
+This time, the task comes up.
+
+![Running Job](../images/RunningJob.png)
+
+By again clicking on the host, we inspect the Task page, and see that the
+`hello_world` process is running.
+
+![Running Task page](../images/runningtask.png)
+
+We then inspect the output by clicking on `stdout` and see our process'
+output:
+
+![stdout page](../images/stdout.png)
+
+## Cleanup
+
+Now that we're done, we kill the job using the Aurora client:
+
+    vagrant@aurora:~$ aurora job killall devcluster/www-data/devel/hello_world
+     INFO] Killing tasks for job: devcluster/www-data/devel/hello_world
+     INFO] Instances to be killed: [0]
+    Successfully killed instances [0]
+    Job killall succeeded
+
+The job page now shows the `hello_world` tasks as completed.
+
+![Killed Task page](../images/killedtask.png)
+
+## Next Steps
+
+Now that you've finished this Tutorial, you should read or do the following:
+
+- [The Aurora Configuration Tutorial](../reference/configuration-tutorial.md), 
which provides more examples
+  and best practices for writing Aurora configurations. You should also look at
+  the [Aurora Configuration Reference](../reference/configuration.md).
+- Explore the Aurora Client - use `aurora -h`, and read the
+  [Aurora Client Commands](../reference/client-commands.md) document.

Added: aurora/site/source/documentation/latest/getting-started/vagrant.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/getting-started/vagrant.md?rev=1739402&view=auto
==============================================================================
--- aurora/site/source/documentation/latest/getting-started/vagrant.md (added)
+++ aurora/site/source/documentation/latest/getting-started/vagrant.md Sat Apr 
16 04:23:06 2016
@@ -0,0 +1,147 @@
+A local Cluster with Vagrant
+============================
+
+This document shows you how to configure a complete cluster using a virtual 
machine. This setup
+replicates a real cluster in your development machine as closely as possible. 
After you complete
+the steps outlined here, you will be ready to create and run your first Aurora 
job.
+
+The following sections describe these steps in detail:
+
+1. [Overview](#user-content-overview)
+1. [Install VirtualBox and 
Vagrant](#user-content-install-virtualbox-and-vagrant)
+1. [Clone the Aurora repository](#user-content-clone-the-aurora-repository)
+1. [Start the local cluster](#user-content-start-the-local-cluster)
+1. [Log onto the VM](#user-content-log-onto-the-vm)
+1. [Run your first job](#user-content-run-your-first-job)
+1. [Rebuild components](#user-content-rebuild-components)
+1. [Shut down or delete your local 
cluster](#user-content-shut-down-or-delete-your-local-cluster)
+1. [Troubleshooting](#user-content-troubleshooting)
+
+
+Overview
+--------
+
+The Aurora distribution includes a set of scripts that enable you to create a 
local cluster in
+your development machine. These scripts use 
[Vagrant](https://www.vagrantup.com/) and
+[VirtualBox](https://www.virtualbox.org/) to run and configure a virtual 
machine. Once the
+virtual machine is running, the scripts install and initialize Aurora and any 
required components
+to create the local cluster.
+
+
+Install VirtualBox and Vagrant
+------------------------------
+
+First, download and install [VirtualBox](https://www.virtualbox.org/) on your 
development machine.
+
+Then download and install [Vagrant](https://www.vagrantup.com/). To verify 
that the installation
+was successful, open a terminal window and type the `vagrant` command. You 
should see a list of
+common commands for this tool.
+
+
+Clone the Aurora repository
+---------------------------
+
+To obtain the Aurora source distribution, clone its Git repository using the 
following command:
+
+     git clone git://git.apache.org/aurora.git
+
+
+Start the local cluster
+-----------------------
+
+Now change into the `aurora/` directory, which contains the Aurora source code 
and
+other scripts and tools:
+
+     cd aurora/
+
+To start the local cluster, type the following command:
+
+     vagrant up
+
+This command uses the configuration scripts in the Aurora distribution to:
+
+* Download a Linux system image.
+* Start a virtual machine (VM) and configure it.
+* Install the required build tools on the VM.
+* Install Aurora's requirements (like [Mesos](http://mesos.apache.org/) and
+[Zookeeper](http://zookeeper.apache.org/)) on the VM.
+* Build and install Aurora from source on the VM.
+* Start Aurora's services on the VM.
+
+This process takes several minutes to complete.
+
+You may notice a warning that guest additions in the VM don't match your 
version of VirtualBox.
+This should generally be harmless, but you may wish to install a vagrant 
plugin to take care of
+mismatches like this for you:
+
+     vagrant plugin install vagrant-vbguest
+
+With this plugin installed, whenever you `vagrant up` the plugin will upgrade 
the guest additions
+for you when a version mis-match is detected. You can read more about the 
plugin
+[here](https://github.com/dotless-de/vagrant-vbguest).
+
+To verify that Aurora is running on the cluster, visit the following URLs:
+
+* Scheduler - http://192.168.33.7:8081
+* Observer - http://192.168.33.7:1338
+* Mesos Master - http://192.168.33.7:5050
+* Mesos Slave - http://192.168.33.7:5051
+
+
+Log onto the VM
+---------------
+
+To SSH into the VM, run the following command in your development machine:
+
+     vagrant ssh
+
+To verify that Aurora is installed in the VM, type the `aurora` command. You 
should see a list
+of arguments and possible commands.
+
+The `/vagrant` directory on the VM is mapped to the `aurora/` local directory
+from which you started the cluster. You can edit files inside this directory 
in your development
+machine and access them from the VM under `/vagrant`.
+
+A pre-installed `clusters.json` file refers to your local cluster as 
`devcluster`, which you
+will use in client commands.
+
+
+Run your first job
+------------------
+
+Now that your cluster is up and running, you are ready to define and run your 
first job in Aurora.
+For more information, see the [Aurora Tutorial](tutorial.md).
+
+
+Rebuild components
+------------------
+
+If you are changing Aurora code and would like to rebuild a component, you can 
use the `aurorabuild`
+command on the VM to build and restart a component.  This is considerably 
faster than destroying
+and rebuilding your VM.
+
+`aurorabuild` accepts a list of components to build and update. To get a list 
of supported
+components, invoke the `aurorabuild` command with no arguments:
+
+     vagrant ssh -c 'aurorabuild client'
+
+
+Shut down or delete your local cluster
+--------------------------------------
+
+To shut down your local cluster, run the `vagrant halt` command in your 
development machine. To
+start it again, run the `vagrant up` command.
+
+Once you are finished with your local cluster, or if you would otherwise like 
to start from scratch,
+you can use the command `vagrant destroy` to turn off and delete the virtual 
file system.
+
+
+Troubleshooting
+---------------
+
+Most of the vagrant related problems can be fixed by the following steps:
+
+* Destroying the vagrant environment with `vagrant destroy`
+* Killing any orphaned VMs (see AURORA-499) with `virtualbox` UI or 
`VBoxManage` command line tool
+* Cleaning the repository of build artifacts and other intermediate output 
with `git clean -fdx`
+* Bringing up the vagrant environment with `vagrant up`

Modified: aurora/site/source/documentation/latest/index.html.md
URL: 
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/index.html.md?rev=1739402&r1=1739401&r2=1739402&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/index.html.md (original)
+++ aurora/site/source/documentation/latest/index.html.md Sat Apr 16 04:23:06 
2016
@@ -1,44 +1,73 @@
 ## Introduction
-Apache Aurora is a service scheduler that runs on top of Apache Mesos, 
enabling you to run long-running services that take advantage of Apache Mesos' 
scalability, fault-tolerance, and resource isolation. This documentation has 
been organized into sections with three audiences in mind:
 
- * Users: General information about the project and to learn how to run an 
Aurora job.
- * Operators: For those that wish to manage and fine-tune an Aurora cluster.
- * Developers: All the information you need to start modifying Aurora and 
contributing back to the project.
-
-We encourage you to ask questions on the [Aurora user 
list](http://aurora.apache.org/community/) or the `#aurora` IRC channel on 
`irc.freenode.net`.
-
-## Users
- * [Install Aurora on virtual machines on your private 
machine](/documentation/latest/vagrant/)
- * [Hello World Tutorial](/documentation/latest/tutorial/)
- * [User Guide](/documentation/latest/user-guide/)
- * [Configuration Tutorial](/documentation/latest/configuration-tutorial/)
- * [Aurora + Thermos Reference](/documentation/latest/configuration-reference/)
- * [Command Line Client](/documentation/latest/client-commands/)
- * [Client cluster 
configuration](/documentation/latest/client-cluster-configuration/)
- * [Cron Jobs](/documentation/latest/cron-jobs/)
+Apache Aurora is a service scheduler that runs on top of Apache Mesos, 
enabling you to run
+long-running services, cron jobs, and ad-hoc jobs that take advantage of 
Apache Mesos' scalability,
+fault-tolerance, and resource isolation.
+
+We encourage you to ask questions on the [Aurora user 
list](http://aurora.apache.org/community/) or
+the `#aurora` IRC channel on `irc.freenode.net`.
+
+
+## Getting Started
+Information for everyone new to Apache Aurora.
+
+ * [Aurora System Overview](getting-started/overview.md)
+ * [Hello World Tutorial](getting-started/tutorial.md)
+ * [Local cluster with Vagrant](getting-started/vagrant.md)
+
+## Features
+Description of important Aurora features.
+
+ * [Containers](features/containers.md)
+ * [Cron Jobs](features/cron-jobs.md)
+ * [Job Updates](features/job-updates.md)
+ * [Multitenancy](features/multitenancy.md)
+ * [Resource Isolation](features/resource-isolation.md)
+ * [Scheduling Constraints](features/constraints.md)
+ * [Services](features/services.md)
+ * [Service Discovery](features/service-discovery.md)
+ * [SLA Metrics](features/sla-metrics.md)
 
 ## Operators
- * [Installation](/documentation/latest/installing/)
- * [Deployment and cluster 
configuration](/documentation/latest/deploying-aurora-scheduler/)
- * [Security](/documentation/latest/security/)
- * [Monitoring](/documentation/latest/monitoring/)
- * [Hooks for Aurora Client API](/documentation/latest/hooks/)
- * [Scheduler Storage](/documentation/latest/storage/)
- * [Scheduler Storage and Maintenance](/documentation/latest/storage-config/)
- * [SLA Measurement](/documentation/latest/sla/)
- * [Resource Isolation and Sizing](/documentation/latest/resources/)
+For those that wish to manage and fine-tune an Aurora cluster.
+
+ * [Installation](operations/installation.md)
+ * [Configuration](operations/configuration.md)
+ * [Monitoring](operations/monitoring.md)
+ * [Security](operations/security.md)
+ * [Storage](operations/storage.md)
+ * [Backup](operations/backup-restore.md)
+
+## Reference
+The complete reference of commands, configuration options, and scheduler 
internals.
+
+ * [Task lifecycle](reference/task-lifecycle.md)
+ * Configuration (`.aurora` files)
+    - [Configuration Reference](reference/configuration.md)
+    - [Configuration Tutorial](reference/configuration-tutorial.md)
+    - [Configuration Best Practices](reference/configuration-best-practices.md)
+    - [Configuration Templating](reference/configuration-templating.md)
+ * Aurora Client
+    - [Client Commands](reference/client-commands.md)
+    - [Client Hooks](reference/client-hooks.md)
+    - [Client Cluster Configuration](reference/client-cluster-configuration.md)
+ * [Scheduler Configuration](reference/scheduler-configuration.md)
+
+## Additional Resources
+ * [Tools integrating with Aurora](additional-resources/tools.md)
+ * [Presentation videos and slides](additional-resources/presentations.md)
 
 ## Developers
+All the information you need to start modifying Aurora and contributing back 
to the project.
+
  * [Contributing to the project](contributing/)
- * [Developing the Aurora 
Scheduler](/documentation/latest/developing-aurora-scheduler/)
- * [Developing the Aurora 
Client](/documentation/latest/developing-aurora-client/)
- * [Committers Guide](/documentation/latest/committers/)
- * [Design Documents](/documentation/latest/design-documents/)
- * [Deprecation Guide](/documentation/latest/thrift-deprecation/)
- * [Build System](/documentation/latest/build-system/)
- * [Generating test resources](/documentation/latest/test-resource-generation/)
+ * [Committer's Guide](development/committers-guide.md)
+ * [Design Documents](development/design-documents.md)
+ * Developing the Aurora components:
+     - [Client](development/client.md)
+     - [Scheduler](development/scheduler.md)
+     - [Scheduler UI](development/ui.md)
+     - [Thermos](development/thermos.md)
+     - [Thrift structures](development/thrift.md)
 
 
-## Additional Resources
- * [Tools integrating with Aurora](/documentation/latest/tools/)
- * [Presentation videos and slides](/documentation/latest/presentations/)

svn commit: r1739402 [3/6] - in /aurora/site: publish/ publish/blog/ publish/blog/aurora-0-13-0-released/ publish/documentation/latest/ publish/documentation/latest/additional-resources/presentations/ publish/documentation/latest/additional-resources/t...

Reply via email to