[1/7] aurora git commit: Reorganize Documentation

serb Mon, 28 Mar 2016 13:56:21 -0700

Repository: aurora
Updated Branches:
  refs/heads/master 095009596 -> f28f41a70



http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/security.md
----------------------------------------------------------------------
diff --git a/docs/security.md b/docs/security.md
deleted file mode 100644
index 32bea42..0000000
--- a/docs/security.md
+++ /dev/null
@@ -1,279 +0,0 @@
-Aurora integrates with [Apache Shiro](http://shiro.apache.org/) to provide 
security
-controls for its API. In addition to providing some useful features out of the 
box, Shiro
-also allows Aurora cluster administrators to adapt the security system to 
their organizationâs
-existing infrastructure.
-
-- [Enabling Security](#enabling-security)
-- [Authentication](#authentication)
-       - [HTTP Basic Authentication](#http-basic-authentication)
-               - [Server Configuration](#server-configuration)
-               - [Client Configuration](#client-configuration)
-       - [HTTP SPNEGO Authentication 
(Kerberos)](#http-spnego-authentication-kerberos)
-               - [Server Configuration](#server-configuration-1)
-               - [Client Configuration](#client-configuration-1)
-- [Authorization](#authorization)
-       - [Using an INI file to define security 
controls](#using-an-ini-file-to-define-security-controls)
-               - [Caveats](#caveats)
-- [Implementing a Custom Realm](#implementing-a-custom-realm)
-       - [Packaging a realm module](#packaging-a-realm-module)
-- [Known Issues](#known-issues)
-
-# Enabling Security
-
-There are two major components of security:
-[authentication and 
authorization](http://en.wikipedia.org/wiki/Authentication#Authorization).  A
-cluster administrator may choose the approach used for each, and may also 
implement custom
-mechanisms for either.  Later sections describe the options available.
-
-# Authentication
-
-The scheduler must be configured with instructions for how to process 
authentication
-credentials at a minimum.  There are currently two built-in authentication 
schemes -
-[HTTP Basic 
Authentication](http://en.wikipedia.org/wiki/Basic_access_authentication), and
-[SPNEGO](http://en.wikipedia.org/wiki/SPNEGO) (Kerberos).
-
-## HTTP Basic Authentication
-
-Basic Authentication is a very quick way to add *some* security.  It is 
supported
-by all major browsers and HTTP client libraries with minimal work.  However,
-before relying on Basic Authentication you should be aware of the [security
-considerations](http://tools.ietf.org/html/rfc2617#section-4).
-
-### Server Configuration
-
-At a minimum you need to set 4 command-line flags on the scheduler:
-
-```
--http_authentication_mechanism=BASIC
--shiro_realm_modules=INI_AUTHNZ
--shiro_ini_path=path/to/security.ini
-```
-
-And create a security.ini file like so:
-
-```
-[users]
-sally = apple, admin
-
-[roles]
-admin = *
-```
-
-The details of the security.ini file are explained below. Note that this file 
contains plaintext,
-unhashed passwords.
-
-### Client Configuration
-
-To configure the client for HTTP Basic authentication, add an entry to 
~/.netrc with your credentials
-
-```
-% cat ~/.netrc
-# ...
-
-machine aurora.example.com
-login sally
-password apple
-
-# ...
-```
-
-No changes are required to `clusters.json`.
-
-## HTTP SPNEGO Authentication (Kerberos)
-
-### Server Configuration
-At a minimum you need to set 6 command-line flags on the scheduler:
-
-```
--http_authentication_mechanism=NEGOTIATE
--shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
--kerberos_server_principal=HTTP/[email protected]
--kerberos_server_keytab=path/to/aurora.example.com.keytab
--shiro_ini_path=path/to/security.ini
-```
-
-And create a security.ini file like so:
-
-```
-% cat path/to/security.ini
-[users]
-sally = _, admin
-
-[roles]
-admin = *
-```
-
-What's going on here? First, Aurora must be configured to request Kerberos 
credentials when presented with an
-unauthenticated request. This is achieved by setting
-
-```
--http_authentication_mechanism=NEGOTIATE
-```
-
-Next, a Realm module must be configured to **authenticate** the current 
request using the Kerberos
-credentials that were requested. Aurora ships with a realm module that can do 
this
-
-```
--shiro_realm_modules=KERBEROS5_AUTHN[,...]
-```
-
-The Kerberos5Realm requires a keytab file and a server principal name. The 
principal name will usually
-be in the form `HTTP/[email protected]`.
-
-```
--kerberos_server_principal=HTTP/[email protected]
--kerberos_server_keytab=path/to/aurora.example.com.keytab
-```
-
-The Kerberos5 realm module is authentication-only. For scheduler security to 
work you must also
-enable a realm module that provides an Authorizer implementation. For example, 
to do this using the
-IniShiroRealmModule:
-
-```
--shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ
-```
-
-You can then configure authorization using a security.ini file as described 
below
-(the password field is ignored). You must configure the realm module with the 
path to this file:
-
-```
--shiro_ini_path=path/to/security.ini
-```
-
-### Client Configuration
-To use Kerberos on the client-side you must build Kerberos-enabled client 
binaries. Do this with
-
-```
-./pants binary src/main/python/apache/aurora/kerberos:kaurora
-./pants binary src/main/python/apache/aurora/kerberos:kaurora_admin
-```
-
-You must also configure each cluster where you've enabled Kerberos on the 
scheduler
-to use Kerberos authentication. Do this by setting `auth_mechanism` to 
`KERBEROS`
-in `clusters.json`.
-
-```
-% cat ~/.aurora/clusters.json
-{
-    "devcluser": {
-        "auth_mechanism": "KERBEROS",
-        ...
-    },
-    ...
-}
-```
-
-# Authorization
-Given a means to authenticate the entity a client claims they are, we need to 
define what privileges they have.
-
-## Using an INI file to define security controls
-
-The simplest security configuration for Aurora is an INI file on the 
scheduler.  For small
-clusters, or clusters where the users and access controls change relatively 
infrequently, this is
-likely the preferred approach.  However you may want to avoid this approach if 
access permissions
-are rapidly changing, or if your access control information already exists in 
another system.
-
-You can enable INI-based configuration with following scheduler command line 
arguments:
-
-```
--http_authentication_mechanism=BASIC
--shiro_ini_path=path/to/security.ini
-```
-
-*note* As the argument name reveals, this is using Shiroâs
-[IniRealm](http://shiro.apache.org/configuration.html#Configuration-INIConfiguration)
 behind
-the scenes.
-
-The INI file will contain two sections - users and roles.  Hereâs an example 
for what might
-be in security.ini:
-
-```
-[users]
-sally = apple, admin
-jim = 123456, accounting
-becky = letmein, webapp
-larry = 654321,accounting
-steve = password
-
-[roles]
-admin = *
-accounting = thrift.AuroraAdmin:setQuota
-webapp = thrift.AuroraSchedulerManager:*:webapp
-```
-
-The users section defines user user credentials and the role(s) they are 
members of.  These lines
-are of the format `<user> = <password>[, <role>...]`.  As you probably 
noticed, the passwords are
-in plaintext and as a result read access to this file should be restricted.
-
-In this configuration, each user has different privileges for actions in the 
cluster because
-of the roles they are a part of:
-
-* admin is granted all privileges
-* accounting may adjust the amount of resource quota for any role
-* webapp represents a collection of jobs that represents a service, and its 
members may create and modify any jobs owned by it
-
-### Caveats
-You might find documentation on the Internet suggesting there are additional 
sections in `shiro.ini`,
-like `[main]` and `[urls]`. These are not supported by Aurora as it uses a 
different mechanism to configure
-those parts of Shiro. Think of Aurora's `security.ini` as a subset with only 
`[users]` and `[roles]` sections.
-
-## Implementing Delegated Authorization
-
-It is possible to leverage Shiro's `runAs` feature by implementing a custom 
Servlet Filter that provides
-the capability and passing it's fully qualified class name to the command line 
argument
-`-shiro_after_auth_filter`. The filter is registered in the same filter chain 
as the Shiro auth filters
-and is placed after the Shiro auth filters in the filter chain. This ensures 
that the Filter is invoked
-after the Shiro filters have had a chance to authenticate the request.
-
-# Implementing a Custom Realm
-
-Since Auroraâs security is backed by [Apache 
Shiro](https://shiro.apache.org), you can implement a
-custom [Realm](http://shiro.apache.org/realm.html) to define 
organization-specific security behavior.
-
-In addition to using Shiro's standard APIs to implement a Realm you can link 
against Aurora to
-access the type-safe Permissions Aurora uses. See the Javadoc for 
`org.apache.aurora.scheduler.spi`
-for more information.
-
-## Packaging a realm module
-Package your custom Realm(s) with a Guice module that exposes a `Set<Realm>` 
multibinding.
-
-```java
-package com.example;
-
-import com.google.inject.AbstractModule;
-import com.google.inject.multibindings.Multibinder;
-import org.apache.shiro.realm.Realm;
-
-public class MyRealmModule extends AbstractModule {
-  @Override
-  public void configure() {
-    Realm myRealm = new MyRealm();
-
-    Multibinder.newSetBinder(binder(), 
Realm.class).addBinding().toInstance(myRealm);
-  }
-
-  static class MyRealm implements Realm {
-    // Realm implementation.
-  }
-}
-```
-
-To use your module in the scheduler, include it as a realm module based on its 
fully-qualified
-class name:
-
-```
--shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ,com.example.MyRealmModule
-```
-
-# Known Issues
-
-While the APIs and SPIs we ship with are stable as of 0.8.0, we are aware of 
several incremental
-improvements. Please follow, vote, or send patches.
-
-Relevant tickets:
-* [AURORA-343](https://issues.apache.org/jira/browse/AURORA-343): HTTPS support
-* [AURORA-1248](https://issues.apache.org/jira/browse/AURORA-1248): Client 
retries 4xx errors
-* [AURORA-1279](https://issues.apache.org/jira/browse/AURORA-1279): Remove 
kerberos-specific build targets
-* [AURORA-1293](https://issues.apache.org/jira/browse/AURORA-1291): Consider 
defining a JSON format in place of INI
-* [AURORA-1179](https://issues.apache.org/jira/browse/AURORA-1179): Supported 
hashed passwords in security.ini
-* [AURORA-1295](https://issues.apache.org/jira/browse/AURORA-1295): Support 
security for the ReadOnlyScheduler service

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/sla.md
----------------------------------------------------------------------
diff --git a/docs/sla.md b/docs/sla.md
deleted file mode 100644
index a558e00..0000000
--- a/docs/sla.md
+++ /dev/null
@@ -1,177 +0,0 @@
-Aurora SLA Measurement
---------------
-
-- [Overview](#overview)
-- [Metric Details](#metric-details)
-  - [Platform Uptime](#platform-uptime)
-  - [Job Uptime](#job-uptime)
-  - [Median Time To Assigned (MTTA)](#median-time-to-assigned-\(mtta\))
-  - [Median Time To Running (MTTR)](#median-time-to-running-\(mttr\))
-- [Limitations](#limitations)
-
-## Overview
-
-The primary goal of the feature is collection and monitoring of Aurora job SLA 
(Service Level
-Agreements) metrics that defining a contractual relationship between the 
Aurora/Mesos platform
-and hosted services.
-
-The Aurora SLA feature is by default only enabled for service (non-cron)
-production jobs (`"production = True"` in your `.aurora` config). It can be 
enabled for
-non-production services via the scheduler command line flag 
`-sla_non_prod_metrics`.
-
-Counters that track SLA measurements are computed periodically within the 
scheduler.
-The individual instance metrics are refreshed every minute (configurable via
-`sla_stat_refresh_interval`). The instance counters are subsequently 
aggregated by
-relevant grouping types before exporting to scheduler `/vars` endpoint (when 
using `vagrant`
-that would be `http://192.168.33.7:8081/vars`)
-
-## Metric Details
-
-### Platform Uptime
-
-*Aggregate amount of time a job spends in a non-runnable state due to platform 
unavailability
-or scheduling delays. This metric tracks Aurora/Mesos uptime performance and 
reflects on any
-system-caused downtime events (tasks LOST or DRAINED). Any user-initiated task 
kills/restarts
-will not degrade this metric.*
-
-**Collection scope:**
-
-* Per job - `sla_<job_key>_platform_uptime_percent`
-* Per cluster - `sla_cluster_platform_uptime_percent`
-
-**Units:** percent
-
-A fault in the task environment may cause the Aurora/Mesos to have different 
views on the task state
-or lose track of the task existence. In such cases, the service task is marked 
as LOST and
-rescheduled by Aurora. For example, this may happen when the task stays in 
ASSIGNED or STARTING
-for too long or the Mesos slave becomes unhealthy (or disappears completely). 
The time between
-task entering LOST and its replacement reaching RUNNING state is counted 
towards platform downtime.
-
-Another example of a platform downtime event is the administrator-requested 
task rescheduling. This
-happens during planned Mesos slave maintenance when all slave tasks are marked 
as DRAINED and
-rescheduled elsewhere.
-
-To accurately calculate Platform Uptime, we must separate platform incurred 
downtime from user
-actions that put a service instance in a non-operational state. It is simpler 
to isolate
-user-incurred downtime and treat all other downtime as platform incurred.
-
-Currently, a user can cause a healthy service (task) downtime in only two 
ways: via `killTasks`
-or `restartShards` RPCs. For both, their affected tasks leave an audit state 
transition trail
-relevant to uptime calculations. By applying a special "SLA meaning" to 
exposed task state
-transition records, we can build a deterministic downtime trace for every 
given service instance.
-
-A task going through a state transition carries one of three possible SLA 
meanings
-(see 
[SlaAlgorithm.java](../src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java)
 for
-sla-to-task-state mapping):
-
-* Task is UP: starts a period where the task is considered to be up and 
running from the Aurora
-  platform standpoint.
-
-* Task is DOWN: starts a period where the task cannot reach the UP state for 
some
-  non-user-related reason. Counts towards instance downtime.
-
-* Task is REMOVED from SLA: starts a period where the task is not expected to 
be UP due to
-  user initiated action or failure. We ignore this period for the uptime 
calculation purposes.
-
-This metric is recalculated over the last sampling period (last minute) to 
account for
-any UP/DOWN/REMOVED events. It ignores any UP/DOWN events not immediately 
adjacent to the
-sampling interval as well as adjacent REMOVED events.
-
-### Job Uptime
-
-*Percentage of the job instances considered to be in RUNNING state for the 
specified duration
-relative to request time. This is a purely application side metric that is 
considering aggregate
-uptime of all RUNNING instances. Any user- or platform initiated restarts 
directly affect
-this metric.*
-
-**Collection scope:** We currently expose job uptime values at 5 pre-defined
-percentiles (50th,75th,90th,95th and 99th):
-
-* `sla_<job_key>_job_uptime_50_00_sec`
-* `sla_<job_key>_job_uptime_75_00_sec`
-* `sla_<job_key>_job_uptime_90_00_sec`
-* `sla_<job_key>_job_uptime_95_00_sec`
-* `sla_<job_key>_job_uptime_99_00_sec`
-
-**Units:** seconds
-You can also get customized real-time stats from aurora client. See `aurora 
sla -h` for
-more details.
-
-### Median Time To Assigned (MTTA)
-
-*Median time a job spends waiting for its tasks to be assigned to a host. This 
is a combined
-metric that helps track the dependency of scheduling performance on the 
requested resources
-(user scope) as well as the internal scheduler bin-packing algorithm 
efficiency (platform scope).*
-
-**Collection scope:**
-
-* Per job - `sla_<job_key>_mtta_ms`
-* Per cluster - `sla_cluster_mtta_ms`
-* Per instance size (small, medium, large, x-large, xx-large). Size are 
defined in:
-[ResourceAggregates.java](../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
-  * By CPU:
-    * `sla_cpu_small_mtta_ms`
-    * `sla_cpu_medium_mtta_ms`
-    * `sla_cpu_large_mtta_ms`
-    * `sla_cpu_xlarge_mtta_ms`
-    * `sla_cpu_xxlarge_mtta_ms`
-  * By RAM:
-    * `sla_ram_small_mtta_ms`
-    * `sla_ram_medium_mtta_ms`
-    * `sla_ram_large_mtta_ms`
-    * `sla_ram_xlarge_mtta_ms`
-    * `sla_ram_xxlarge_mtta_ms`
-  * By DISK:
-    * `sla_disk_small_mtta_ms`
-    * `sla_disk_medium_mtta_ms`
-    * `sla_disk_large_mtta_ms`
-    * `sla_disk_xlarge_mtta_ms`
-    * `sla_disk_xxlarge_mtta_ms`
-
-**Units:** milliseconds
-
-MTTA only considers instances that have already reached ASSIGNED state and 
ignores those
-that are still PENDING. This ensures straggler instances (e.g. with 
unreasonable resource
-constraints) do not affect metric curves.
-
-### Median Time To Running (MTTR)
-
-*Median time a job waits for its tasks to reach RUNNING state. This is a 
comprehensive metric
-reflecting on the overall time it takes for the Aurora/Mesos to start 
executing user content.*
-
-**Collection scope:**
-
-* Per job - `sla_<job_key>_mttr_ms`
-* Per cluster - `sla_cluster_mttr_ms`
-* Per instance size (small, medium, large, x-large, xx-large). Size are 
defined in:
-[ResourceAggregates.java](../src/main/java/org/apache/aurora/scheduler/base/ResourceAggregates.java)
-  * By CPU:
-    * `sla_cpu_small_mttr_ms`
-    * `sla_cpu_medium_mttr_ms`
-    * `sla_cpu_large_mttr_ms`
-    * `sla_cpu_xlarge_mttr_ms`
-    * `sla_cpu_xxlarge_mttr_ms`
-  * By RAM:
-    * `sla_ram_small_mttr_ms`
-    * `sla_ram_medium_mttr_ms`
-    * `sla_ram_large_mttr_ms`
-    * `sla_ram_xlarge_mttr_ms`
-    * `sla_ram_xxlarge_mttr_ms`
-  * By DISK:
-    * `sla_disk_small_mttr_ms`
-    * `sla_disk_medium_mttr_ms`
-    * `sla_disk_large_mttr_ms`
-    * `sla_disk_xlarge_mttr_ms`
-    * `sla_disk_xxlarge_mttr_ms`
-
-**Units:** milliseconds
-
-MTTR only considers instances in RUNNING state. This ensures straggler 
instances (e.g. with
-unreasonable resource constraints) do not affect metric curves.
-
-## Limitations
-
-* The availability of Aurora SLA metrics is bound by the scheduler 
availability.
-
-* All metrics are calculated at a pre-defined interval (currently set at 1 
minute).
-  Scheduler restarts may result in missed collections.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/storage-config.md
----------------------------------------------------------------------
diff --git a/docs/storage-config.md b/docs/storage-config.md
deleted file mode 100644
index 7c64841..0000000
--- a/docs/storage-config.md
+++ /dev/null
@@ -1,153 +0,0 @@
-# Storage Configuration And Maintenance
-
-- [Overview](#overview)
-- [Scheduler storage configuration 
flags](#scheduler-storage-configuration-flags)
-  - [Mesos replicated log configuration 
flags](#mesos-replicated-log-configuration-flags)
-    - [-native_log_quorum_size](#-native_log_quorum_size)
-    - [-native_log_file_path](#-native_log_file_path)
-    - [-native_log_zk_group_path](#-native_log_zk_group_path)
-  - [Backup configuration flags](#backup-configuration-flags)
-    - [-backup_interval](#-backup_interval)
-    - [-backup_dir](#-backup_dir)
-    - [-max_saved_backups](#-max_saved_backups)
-- [Recovering from a scheduler backup](#recovering-from-a-scheduler-backup)
-  - [Summary](#summary)
-  - [Preparation](#preparation)
-  - [Cleanup and re-initialize Mesos replicated 
log](#cleanup-and-re-initialize-mesos-replicated-log)
-  - [Restore from backup](#restore-from-backup)
-  - [Cleanup](#cleanup)
-
-## Overview
-
-This document summarizes Aurora storage configuration and maintenance details 
and is
-intended for use by anyone deploying and/or maintaining Aurora.
-
-For a high level overview of the Aurora storage architecture refer to [this 
document](storage.md).
-
-## Scheduler storage configuration flags
-
-Below is a summary of scheduler storage configuration flags that either don't 
have default values
-or require attention before deploying in a production environment.
-
-### Mesos replicated log configuration flags
-
-#### -native_log_quorum_size
-Defines the Mesos replicated log quorum size. See
-[the replicated log configuration 
document](deploying-aurora-scheduler.md#replicated-log-configuration)
-on how to choose the right value.
-
-#### -native_log_file_path
-Location of the Mesos replicated log files. Consider allocating a dedicated 
disk (preferably SSD)
-for Mesos replicated log files to ensure optimal storage performance.
-
-#### -native_log_zk_group_path
-ZooKeeper path used for Mesos replicated log quorum discovery.
-
-See 
[code](../src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java)
 for
-other available Mesos replicated log configuration options and default values.
-
-### Backup configuration flags
-
-Configuration options for the Aurora scheduler backup manager.
-
-#### -backup_interval
-The interval on which the scheduler writes local storage backups.  The default 
is every hour.
-
-#### -backup_dir
-Directory to write backups to.
-
-#### -max_saved_backups
-Maximum number of backups to retain before deleting the oldest backup(s).
-
-## Recovering from a scheduler backup
-
-**Be sure to read the entire page before attempting to restore from a backup, 
as it may have
-unintended consequences.**
-
-### Summary
-
-The restoration procedure replaces the existing (possibly corrupted) Mesos 
replicated log with an
-earlier, backed up, version and requires all schedulers to be taken down 
temporarily while
-restoring. Once completed, the scheduler state resets to what it was when the 
backup was created.
-This means any jobs/tasks created or updated after the backup are unknown to 
the scheduler and will
-be killed shortly after the cluster restarts. All other tasks continue 
operating as normal.
-
-Usually, it is a bad idea to restore a backup that is not extremely recent 
(i.e. older than a few
-hours). This is because the scheduler will expect the cluster to look exactly 
as the backup does,
-so any tasks that have been rescheduled since the backup was taken will be 
killed.
-
-Instructions below have been verified in [Vagrant environment](vagrant.md) and 
with minor
-syntax/path changes should be applicable to any Aurora cluster.
-
-### Preparation
-
-Follow these steps to prepare the cluster for restoring from a backup:
-
-* Stop all scheduler instances
-
-* Consider blocking external traffic on a port defined in `-http_port` for all 
schedulers to
-prevent users from interacting with the scheduler during the restoration 
process. This will help
-troubleshooting by reducing the scheduler log noise and prevent users from 
making changes that will
-be erased after the backup snapshot is restored.
-
-* Configure `aurora_admin` access to run all commands listed in
-  [Restore from backup](#restore-from-backup) section locally on the leading 
scheduler:
-  * Make sure the [clusters.json](client-commands.md#cluster-configuration) 
file configured to
-    access scheduler directly. Set `scheduler_uri` setting and remove `zk`. 
Since leader can get
-    re-elected during the restore steps, consider doing it on all scheduler 
replicas.
-  * Depending on your particular security approach you will need to either 
turn off scheduler
-    authorization by removing scheduler `-http_authentication_mechanism` flag 
or make sure the
-    direct scheduler access is properly authorized. E.g.: in case of Kerberos 
you will need to make
-    a `/etc/hosts` file change to match your local IP to the scheduler URL 
configured in keytabs:
-
-        <local_ip> <scheduler_domain_in_keytabs>
-
-* Next steps are required to put scheduler into a partially disabled state 
where it would still be
-able to accept storage recovery requests but unable to schedule or change task 
states. This may be
-accomplished by updating the following scheduler configuration options:
-  * Set `-mesos_master_address` to a non-existent zk address. This will 
prevent scheduler from
-    registering with Mesos. E.g.: 
`-mesos_master_address=zk://localhost:1111/mesos/master`
-  * `-max_registration_delay` - set to sufficiently long interval to prevent 
registration timeout
-    and as a result scheduler suicide. E.g: `-max_registration_delay=360mins`
-  * Make sure `-reconciliation_initial_delay` option is set high enough (e.g.: 
`365days`) to
-    prevent accidental task GC. This is important as scheduler will attempt to 
reconcile the cluster
-    state and will kill all tasks when restarted with an empty Mesos 
replicated log.
-
-* Restart all schedulers
-
-### Cleanup and re-initialize Mesos replicated log
-
-Get rid of the corrupted files and re-initialize Mesos replicated log:
-
-* Stop schedulers
-* Delete all files under `-native_log_file_path` on all schedulers
-* Initialize Mesos replica's log file: `sudo mesos-log initialize 
--path=<-native_log_file_path>`
-* Start schedulers
-
-### Restore from backup
-
-At this point the scheduler is ready to rehydrate from the backup:
-
-* Identify the leading scheduler by:
-  * examining the `scheduler_lifecycle_LEADER_AWAITING_REGISTRATION` metric at 
the scheduler
-    `/vars` endpoint. Leader will have 1. All other replicas - 0.
-  * examining scheduler logs
-  * or examining Zookeeper registration under the path defined by 
`-zk_endpoints`
-    and `-serverset_path`
-
-* Locate the desired backup file, copy it to the leading scheduler's 
`-backup_dir` folder and stage
-recovery by running the following command on a leader
-`aurora_admin scheduler_stage_recovery --bypass-leader-redirect <cluster> 
scheduler-backup-<yyyy-MM-dd-HH-mm>`
-
-* At this point, the recovery snapshot is staged and available for manual 
verification/modification
-via `aurora_admin scheduler_print_recovery_tasks --bypass-leader-redirect` and
-`scheduler_delete_recovery_tasks --bypass-leader-redirect` commands.
-See `aurora_admin help <command>` for usage details.
-
-* Commit recovery. This instructs the scheduler to overwrite the existing 
Mesos replicated log with
-the provided backup snapshot and initiate a mandatory failover
-`aurora_admin scheduler_commit_recovery --bypass-leader-redirect  <cluster>`
-
-### Cleanup
-Undo any modification done during [Preparation](#preparation) sequence.
-

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/storage.md
----------------------------------------------------------------------
diff --git a/docs/storage.md b/docs/storage.md
deleted file mode 100644
index 6ffed54..0000000
--- a/docs/storage.md
+++ /dev/null
@@ -1,88 +0,0 @@
-#Aurora Scheduler Storage
-
-- [Overview](#overview)
-- [Reads, writes, modifications](#reads-writes-modifications)
-  - [Read lifecycle](#read-lifecycle)
-  - [Write lifecycle](#write-lifecycle)
-- [Atomicity, consistency and isolation](#atomicity-consistency-and-isolation)
-- [Population on restart](#population-on-restart)
-
-## Overview
-
-Aurora scheduler maintains data that need to be persisted to survive failovers 
and restarts.
-For example:
-
-* Task configurations and scheduled task instances
-* Job update configurations and update progress
-* Production resource quotas
-* Mesos resource offer host attributes
-
-Aurora solves its persistence needs by leveraging the Mesos implementation of 
a Paxos replicated
-log [[1]](https://ramcloud.stanford.edu/~ongaro/userstudy/paxos.pdf)
-[[2]](http://en.wikipedia.org/wiki/State_machine_replication) with a key-value
-[LevelDB](https://github.com/google/leveldb) storage as persistence media.
-
-Conceptually, it can be represented by the following major components:
-
-* Volatile storage: in-memory cache of all available data. Implemented via 
in-memory
-[H2 Database](http://www.h2database.com/html/main.html) and accessed via
-[MyBatis](http://mybatis.github.io/mybatis-3/).
-* Log manager: interface between Aurora storage and Mesos replicated log. The 
default schema format
-is [thrift](https://github.com/apache/thrift). Data is stored in serialized 
binary form.
-* Snapshot manager: all data is periodically persisted in Mesos replicated log 
in a single snapshot.
-This helps establishing periodic recovery checkpoints and speeds up volatile 
storage recovery on
-restart.
-* Backup manager: as a precaution, snapshots are periodically written out into 
backup files.
-This solves a [disaster recovery 
problem](storage-config.md#recovering-from-a-scheduler-backup)
-in case of a complete loss or corruption of Mesos log files.
-
-![Storage hierarchy](images/storage_hierarchy.png)
-
-## Reads, writes, modifications
-
-All services in Aurora access data via a set of predefined store interfaces 
(aka stores) logically
-grouped by the type of data they serve. Every interface defines a specific set 
of operations allowed
-on the data thus abstracting out the storage access and the actual persistence 
implementation. The
-latter is especially important in view of a general immutability of persisted 
data. With the Mesos
-replicated log as the underlying persistence solution, data can be read and 
written easily but not
-modified. All modifications are simulated by saving new versions of modified 
objects. This feature
-and general performance considerations justify the existence of the volatile 
in-memory store.
-
-### Read lifecycle
-
-There are two types of reads available in Aurora: consistent and 
weakly-consistent. The difference
-is explained [below](#atomicity-and-isolation).
-
-All reads are served from the volatile storage making reads generally cheap 
storage operations
-from the performance standpoint. The majority of the volatile stores are 
represented by the
-in-memory H2 database. This allows for rich schema definitions, queries and 
relationships that
-key-value storage is unable to match.
-
-### Write lifecycle
-
-Writes are more involved operations since in addition to updating the volatile 
store data has to be
-appended to the replicated log. Data is not available for reads until fully 
ack-ed by both
-replicated log and volatile storage.
-
-## Atomicity, consistency and isolation
-
-Aurora uses [write-ahead 
logging](http://en.wikipedia.org/wiki/Write-ahead_logging) to ensure
-consistency between replicated and volatile storage. In Aurora, data is first 
written into the
-replicated log and only then updated in the volatile store.
-
-Aurora storage uses read-write locks to serialize data mutations and provide 
consistent view of the
-available data. The available `Storage` interface exposes 3 major types of 
operations:
-* `consistentRead` - access is locked using reader's lock and provides 
consistent view on read
-* `weaklyConsistentRead` - access is lock-less. Delivers best contention 
performance but may result
-in stale reads
-* `write` - access is fully serialized by using writer's lock. Operation 
success requires both
-volatile and replicated writes to succeed.
-
-The consistency of the volatile store is enforced via H2 transactional 
isolation.
-
-## Population on restart
-
-Any time a scheduler restarts, it restores its volatile state from the most 
recent position recorded
-in the replicated log by restoring the snapshot and replaying individual log 
entries on top to fully
-recover the state up to the last write.
-

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/task-lifecycle.md
----------------------------------------------------------------------
diff --git a/docs/task-lifecycle.md b/docs/task-lifecycle.md
deleted file mode 100644
index 5d6456c..0000000
--- a/docs/task-lifecycle.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# Task Lifecycle
-
-When Aurora reads a configuration file and finds a `Job` definition, it:
-
-1.  Evaluates the `Job` definition.
-2.  Splits the `Job` into its constituent `Task`s.
-3.  Sends those `Task`s to the scheduler.
-4.  The scheduler puts the `Task`s into `PENDING` state, starting each
-    `Task`'s life cycle.
-
-
-![Life of a task](images/lifeofatask.png)
-
-Please note, a couple of task states described below are missing from
-this state diagram.
-
-
-## PENDING to RUNNING states
-
-When a `Task` is in the `PENDING` state, the scheduler constantly
-searches for machines satisfying that `Task`'s resource request
-requirements (RAM, disk space, CPU time) while maintaining configuration
-constraints such as "a `Task` must run on machines  dedicated  to a
-particular role" or attribute limit constraints such as "at most 2
-`Task`s from the same `Job` may run on each rack". When the scheduler
-finds a suitable match, it assigns the `Task` to a machine and puts the
-`Task` into the `ASSIGNED` state.
-
-From the `ASSIGNED` state, the scheduler sends an RPC to the slave
-machine containing `Task` configuration, which the slave uses to spawn
-an executor responsible for the `Task`'s lifecycle. When the scheduler
-receives an acknowledgment that the machine has accepted the `Task`,
-the `Task` goes into `STARTING` state.
-
-`STARTING` state initializes a `Task` sandbox. When the sandbox is fully
-initialized, Thermos begins to invoke `Process`es. Also, the slave
-machine sends an update to the scheduler that the `Task` is
-in `RUNNING` state.
-
-
-
-## RUNNING to terminal states
-
-There are various ways that an active `Task` can transition into a terminal
-state. By definition, it can never leave this state. However, depending on
-nature of the termination and the originating `Job` definition
-(e.g. `service`, `max_task_failures`), a replacement `Task` might be
-scheduled.
-
-### Natural Termination: FINISHED, FAILED
-
-A `RUNNING` `Task` can terminate without direct user interaction. For
-example, it may be a finite computation that finishes, even something as
-simple as `echo hello world.`, or it could be an exceptional condition in
-a long-lived service. If the `Task` is successful (its underlying
-processes have succeeded with exit status `0` or finished without
-reaching failure limits) it moves into `FINISHED` state. If it finished
-after reaching a set of failure limits, it goes into `FAILED` state.
-
-A terminated `TASK` which is subject to rescheduling will be temporarily
-`THROTTLED`, if it is considered to be flapping. A task is flapping, if its
-previous invocation was terminated after less than 5 minutes (scheduler
-default). The time penalty a task has to remain in the `THROTTLED` state,
-before it is eligible for rescheduling, increases with each consecutive
-failure.
-
-### Forceful Termination: KILLING, RESTARTING
-
-You can terminate a `Task` by issuing an `aurora job kill` command, which
-moves it into `KILLING` state. The scheduler then sends the slave a
-request to terminate the `Task`. If the scheduler receives a successful
-response, it moves the Task into `KILLED` state and never restarts it.
-
-If a `Task` is forced into the `RESTARTING` state via the `aurora job restart`
-command, the scheduler kills the underlying task but in parallel schedules
-an identical replacement for it.
-
-In any case, the responsible executor on the slave follows an escalation
-sequence when killing a running task:
-
-  1. If a `HttpLifecycleConfig` is not present, skip to (4).
-  2. Send a POST to the `graceful_shutdown_endpoint` and wait 5 seconds.
-  3. Send a POST to the `shutdown_endpoint` and wait 5 seconds.
-  4. Send SIGTERM (`kill`) and wait at most `finalization_wait` seconds.
-  5. Send SIGKILL (`kill -9`).
-
-If the executor notices that all `Process`es in a `Task` have aborted
-during this sequence, it will not proceed with subsequent steps.
-Note that graceful shutdown is best-effort, and due to the many
-inevitable realities of distributed systems, it may not be performed.
-
-### Unexpected Termination: LOST
-
-If a `Task` stays in a transient task state for too long (such as `ASSIGNED`
-or `STARTING`), the scheduler forces it into `LOST` state, creating a new
-`Task` in its place that's sent into `PENDING` state.
-
-In addition, if the Mesos core tells the scheduler that a slave has
-become unhealthy (or outright disappeared), the `Task`s assigned to that
-slave go into `LOST` state and new `Task`s are created in their place.
-From `PENDING` state, there is no guarantee a `Task` will be reassigned
-to the same machine unless job constraints explicitly force it there.
-
-### Giving Priority to Production Tasks: PREEMPTING
-
-Sometimes a Task needs to be interrupted, such as when a non-production
-Task's resources are needed by a higher priority production Task. This
-type of interruption is called a *pre-emption*. When this happens in
-Aurora, the non-production Task is killed and moved into
-the `PREEMPTING` state  when both the following are true:
-
-- The task being killed is a non-production task.
-- The other task is a `PENDING` production task that hasn't been
-  scheduled due to a lack of resources.
-
-The scheduler UI shows the non-production task was preempted in favor of
-the production task. At some point, tasks in `PREEMPTING` move to `KILLED`.
-
-Note that non-production tasks consuming many resources are likely to be
-preempted in favor of production tasks.
-
-### Making Room for Maintenance: DRAINING
-
-Cluster operators can set slave into maintenance mode. This will transition
-all `Task` running on this slave into `DRAINING` and eventually to `KILLED`.
-Drained `Task`s will be restarted on other slaves for which no maintenance
-has been announced yet.
-
-
-
-## State Reconciliation
-
-Due to the many inevitable realities of distributed systems, there might
-be a mismatch of perceived and actual cluster state (e.g. a machine returns
-from a `netsplit` but the scheduler has already marked all its `Task`s as
-`LOST` and rescheduled them).
-
-Aurora regularly runs a state reconciliation process in order to detect
-and correct such issues (e.g. by killing the errant `RUNNING` tasks).
-By default, the proper detection of all failure scenarios and inconsistencies
-may take up to an hour.
-
-To emphasize this point: there is no uniqueness guarantee for a single
-instance of a job in the presence of network partitions. If the `Task`
-requires that, it should be baked in at the application level using a
-distributed coordination service such as Zookeeper.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/test-resource-generation.md
----------------------------------------------------------------------
diff --git a/docs/test-resource-generation.md b/docs/test-resource-generation.md
deleted file mode 100644
index e78e742..0000000
--- a/docs/test-resource-generation.md
+++ /dev/null
@@ -1,24 +0,0 @@
-# Generating test resources
-
-## Background
-The Aurora source repository and distributions contain several
-[binary files](../src/test/resources/org/apache/thermos/root/checkpoints) to
-qualify the backwards-compatibility of thermos with checkpoint data. Since
-thermos persists state to disk, to be read by the thermos observer), it is 
important that we have
-tests that prevent regressions affecting the ability to parse 
previously-written data.
-
-## Generating test files
-The files included represent persisted checkpoints that exercise different
-features of thermos. The existing files should not be modified unless
-we are accepting backwards incompatibility, such as with a major release.
-
-It is not practical to write source code to generate these files on the fly,
-as source would be vulnerable to drift (e.g. due to refactoring) in ways
-that would undermine the goal of ensuring backwards compatibility.
-
-The most common reason to add a new checkpoint file would be to provide
-coverage for new thermos features that alter the data format. This is
-accomplished by writing and running a
-[job configuration](configuration-reference.md) that exercises the feature, and
-copying the checkpoint file from the sandbox directory, by default this is
-`/var/run/thermos/checkpoints/<aurora task id>`.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/thrift-deprecation.md
----------------------------------------------------------------------
diff --git a/docs/thrift-deprecation.md b/docs/thrift-deprecation.md
deleted file mode 100644
index 62a71bc..0000000
--- a/docs/thrift-deprecation.md
+++ /dev/null
@@ -1,54 +0,0 @@
-# Thrift API Changes
-
-## Overview
-Aurora uses [Apache Thrift](https://thrift.apache.org/) for representing 
structured data in
-client/server RPC protocol as well as for internal data storage. While Thrift 
is capable of
-correctly handling additions and renames of the existing members, field 
removals must be done
-carefully to ensure backwards compatibility and provide predictable 
deprecation cycle. This
-document describes general guidelines for making Thrift schema changes to the 
existing fields in
-[api.thrift](../api/src/main/thrift/org/apache/aurora/gen/api.thrift).
-
-It is highly recommended to go through the
-[Thrift: The Missing 
Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on
-basic Thrift schema concepts.
-
-## Checklist
-Every existing Thrift schema modification is unique in its requirements and 
must be analyzed
-carefully to identify its scope and expected consequences. The following 
checklist may help in that
-analysis:
-* Is this a new field/struct? If yes, go ahead
-* Is this a pure field/struct rename without any type/structure change? If 
yes, go ahead and rename
-* Anything else, read further to make sure your change is properly planned
-
-## Deprecation cycle
-Any time a breaking change (e.g.: field replacement or removal) is required, 
the following cycle
-must be followed:
-
-### vCurrent
-Change is applied in a way that does not break scheduler/client with this 
version to
-communicate with scheduler/client from vCurrent-1.
-* Do not remove or rename the old field
-* Add a new field as an eventual replacement of the old one and implement a 
dual read/write
-anywhere the old field is used. If a thrift struct is mapped in the DB store 
make sure both columns
-are marked as `NOT NULL`
-* Check 
[storage.thrift](../api/src/main/thrift/org/apache/aurora/gen/storage.thrift) 
to see if the
-affected struct is stored in Aurora scheduler storage. If so, you most likely 
need to backfill
-existing data to ensure both fields are populated eagerly on startup. See
-[this patch](https://reviews.apache.org/r/43172) as a real-life example of 
thrift-struct
-backfilling. IMPORTANT: backfilling implementation needs to ensure both fields 
are populated. This
-is critical to enable graceful scheduler upgrade as well as rollback to the 
old version if needed.
-* Add a deprecation jira ticket into the vCurrent+1 release candidate
-* Add a TODO for the deprecated field mentioning the jira ticket
-
-### vCurrent+1
-Finalize the change by removing the deprecated fields from the Thrift schema.
-* Drop any dual read/write routines added in the previous version
-* Remove thrift backfilling in scheduler
-* Remove the deprecated Thrift field
-
-## Testing
-It's always advisable to test your changes in the local vagrant environment to 
build more
-confidence that you change is backwards compatible. It's easy to simulate 
different
-client/scheduler versions by playing with `aurorabuild` command. See [this 
document](vagrant.md)
-for more.
-

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/tools.md
----------------------------------------------------------------------
diff --git a/docs/tools.md b/docs/tools.md
deleted file mode 100644
index 2ae550d..0000000
--- a/docs/tools.md
+++ /dev/null
@@ -1,16 +0,0 @@
-# Tools
-
-Various tools integrate with Aurora. Is there a tool missing? Let us know, or 
submit a patch to add it!
-
-* Load-balacing technology used to direct traffic to services running on Aurora
-  - [synapse](https://github.com/airbnb/synapse) based on HAProxy
-  - [aurproxy](https://github.com/tellapart/aurproxy) based on nginx
-  - [jobhopper](https://github.com/benley/aurora-jobhopper) performing HTTP 
redirects for easy developers and administor access
-
-* Monitoring
-  - [collectd-aurora](https://github.com/zircote/collectd-aurora) for cluster 
monitoring using collectd
-  - [Prometheus Aurora 
exporter](https://github.com/tommyulfsparre/aurora_exporter) for cluster 
monitoring using Prometheus
-  - [Prometheus service discovery 
integration](http://prometheus.io/docs/operating/configuration/#zookeeper-serverset-sd-configurations-serverset_sd_config)
 for discovering and monitoring services running on Aurora
-
-* Packaging and deployment
-  - [aurora-packaging](https://github.com/apache/aurora-packaging), the source 
of the official Aurora packaes

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/tutorial.md
----------------------------------------------------------------------
diff --git a/docs/tutorial.md b/docs/tutorial.md
deleted file mode 100644
index 95539ef..0000000
--- a/docs/tutorial.md
+++ /dev/null
@@ -1,260 +0,0 @@
-# Aurora Tutorial
-
-This tutorial shows how to use the Aurora scheduler to run (and 
"`printf-debug`")
-a hello world program on Mesos. This is the recommended document for new 
Aurora users
-to start getting up to speed on the system.
-
-- [Prerequisite](#setup-install-aurora)
-- [The Script](#the-script)
-- [Aurora Configuration](#aurora-configuration)
-- [Creating the Job](#creating-the-job)
-- [Watching the Job Run](#watching-the-job-run)
-- [Cleanup](#cleanup)
-- [Next Steps](#next-steps)
-
-
-## Prerequisite
-
-This tutorial assumes you are running [Aurora locally using 
Vagrant](vagrant.md).
-However, in general the instructions are also applicable to any other
-[Aurora installation](installing.md).
-
-Unless otherwise stated, all commands are to be run from the root of the aurora
-repository clone.
-
-
-## The Script
-
-Our "hello world" application is a simple Python script that loops
-forever, displaying the time every few seconds. Copy the code below and
-put it in a file named `hello_world.py` in the root of your Aurora repository 
clone
-(Note: this directory is the same as `/vagrant` inside the Vagrant VMs).
-
-The script has an intentional bug, which we will explain later on.
-
-<!-- NOTE: If you are changing this file, be sure to also update 
examples/vagrant/test_tutorial.sh.
--->
-```python
-import time
-
-def main():
-  SLEEP_DELAY = 10
-  # Python ninjas - ignore this blatant bug.
-  for i in xrang(100):
-    print("Hello world! The time is now: %s. Sleeping for %d secs" % (
-      time.asctime(), SLEEP_DELAY))
-    time.sleep(SLEEP_DELAY)
-
-if __name__ == "__main__":
-  main()
-```
-
-## Aurora Configuration
-
-Once we have our script/program, we need to create a *configuration
-file* that tells Aurora how to manage and launch our Job. Save the below
-code in the file `hello_world.aurora`.
-
-<!-- NOTE: If you are changing this file, be sure to also update 
examples/vagrant/test_tutorial.sh.
--->
-```python
-pkg_path = '/vagrant/hello_world.py'
-
-# we use a trick here to make the configuration change with
-# the contents of the file, for simplicity.  in a normal setting, packages 
would be
-# versioned, and the version number would be changed in the configuration.
-import hashlib
-with open(pkg_path, 'rb') as f:
-  pkg_checksum = hashlib.md5(f.read()).hexdigest()
-
-# copy hello_world.py into the local sandbox
-install = Process(
-  name = 'fetch_package',
-  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, 
pkg_checksum))
-
-# run the script
-hello_world = Process(
-  name = 'hello_world',
-  cmdline = 'python -u hello_world.py')
-
-# describe the task
-hello_world_task = SequentialTask(
-  processes = [install, hello_world],
-  resources = Resources(cpu = 1, ram = 1*MB, disk=8*MB))
-
-jobs = [
-  Service(cluster = 'devcluster',
-          environment = 'devel',
-          role = 'www-data',
-          name = 'hello_world',
-          task = hello_world_task)
-]
-```
-
-There is a lot going on in that configuration file:
-
-1. From a "big picture" viewpoint, it first defines two
-Processes. Then it defines a Task that runs the two Processes in the
-order specified in the Task definition, as well as specifying what
-computational and memory resources are available for them.  Finally,
-it defines a Job that will schedule the Task on available and suitable
-machines. This Job is the sole member of a list of Jobs; you can
-specify more than one Job in a config file.
-
-2. At the Process level, it specifies how to get your code into the
-local sandbox in which it will run. It then specifies how the code is
-actually run once the second Process starts.
-
-For more about Aurora configuration files, see the [Configuration
-Tutorial](configuration-tutorial.md) and the [Aurora + Thermos
-Reference](configuration-reference.md) (preferably after finishing this
-tutorial).
-
-
-## Creating the Job
-
-We're ready to launch our job! To do so, we use the Aurora Client to
-issue a Job creation request to the Aurora scheduler.
-
-Many Aurora Client commands take a *job key* argument, which uniquely
-identifies a Job. A job key consists of four parts, each separated by a
-"/". The four parts are  `<cluster>/<role>/<environment>/<jobname>`
-in that order:
-
-* Cluster refers to the name of a particular Aurora installation.
-* Role names are user accounts existing on the slave machines. If you
-don't know what accounts are available, contact your sysadmin.
-* Environment names are namespaces; you can count on `test`, `devel`,
-`staging` and `prod` existing.
-* Jobname is the custom name of your job.
-
-When comparing two job keys, if any of the four parts is different from
-its counterpart in the other key, then the two job keys identify two separate
-jobs. If all four values are identical, the job keys identify the same job.
-
-The `clusters.json` [client configuration](client-cluster-configuration.md)
-for the Aurora scheduler defines the available cluster names.
-For Vagrant, from the top-level of your Aurora repository clone, do:
-
-    $ vagrant ssh
-
-Followed by:
-
-    vagrant@aurora:~$ cat /etc/aurora/clusters.json
-
-You'll see something like the following. The `name` value shown here, 
corresponds to a job key's cluster value.
-
-```javascript
-[{
-  "name": "devcluster",
-  "zk": "192.168.33.7",
-  "scheduler_zk_path": "/aurora/scheduler",
-  "auth_mechanism": "UNAUTHENTICATED",
-  "slave_run_directory": "latest",
-  "slave_root": "/var/lib/mesos"
-}]
-```
-
-The Aurora Client command that actually runs our Job is `aurora job create`. 
It creates a Job as
-specified by its job key and configuration file arguments and runs it.
-
-    aurora job create <cluster>/<role>/<environment>/<jobname> <config_file>
-
-Or for our example:
-
-    aurora job create devcluster/www-data/devel/hello_world 
/vagrant/hello_world.aurora
-
-After entering our virtual machine using `vagrant ssh`, this returns:
-
-    vagrant@aurora:~$ aurora job create devcluster/www-data/devel/hello_world 
/vagrant/hello_world.aurora
-     INFO] Creating job hello_world
-     INFO] Checking status of devcluster/www-data/devel/hello_world
-    Job create succeeded: job 
url=http://aurora.local:8081/scheduler/www-data/devel/hello_world
-
-
-## Watching the Job Run
-
-Now that our job is running, let's see what it's doing. Access the
-scheduler web interface at 
`http://$scheduler_hostname:$scheduler_port/scheduler`
-Or when using `vagrant`, `http://192.168.33.7:8081/scheduler`
-First we see what Jobs are scheduled:
-
-![Scheduled Jobs](images/ScheduledJobs.png)
-
-Click on your user name, which in this case was `www-data`, and we see the 
Jobs associated
-with that role:
-
-![Role Jobs](images/RoleJobs.png)
-
-If you click on your `hello_world` Job, you'll see:
-
-![hello_world Job](images/HelloWorldJob.png)
-
-Oops, looks like our first job didn't quite work! The task is temporarily 
throttled for
-having failed on every attempt of the Aurora scheduler to run it. We have to 
figure out
-what is going wrong.
-
-On the Completed tasks tab, we see all past attempts of the Aurora scheduler 
to run our job.
-
-![Completed tasks tab](images/CompletedTasks.png)
-
-We can navigate to the Task page of a failed run by clicking on the host link.
-
-![Task page](images/TaskBreakdown.png)
-
-Once there, we see that the `hello_world` process failed. The Task page
-captures the standard error and standard output streams and makes them 
available.
-Clicking through to `stderr` on the failed `hello_world` process, we see what 
happened.
-
-![stderr page](images/stderr.png)
-
-It looks like we made a typo in our Python script. We wanted `xrange`,
-not `xrang`. Edit the `hello_world.py` script to use the correct function
-and save it as `hello_world_v2.py`. Then update the `hello_world.aurora`
-configuration to the newest version.
-
-In order to try again, we can now instruct the scheduler to update our job:
-
-    vagrant@aurora:~$ aurora update start 
devcluster/www-data/devel/hello_world /vagrant/hello_world.aurora
-     INFO] Starting update for: hello_world
-    Job update has started. View your update progress at 
http://aurora.local:8081/scheduler/www-data/devel/hello_world/update/8ef38017-e60f-400d-a2f2-b5a8b724e95b
-
-This time, the task comes up.
-
-![Running Job](images/RunningJob.png)
-
-By again clicking on the host, we inspect the Task page, and see that the
-`hello_world` process is running.
-
-![Running Task page](images/runningtask.png)
-
-We then inspect the output by clicking on `stdout` and see our process'
-output:
-
-![stdout page](images/stdout.png)
-
-## Cleanup
-
-Now that we're done, we kill the job using the Aurora client:
-
-    vagrant@aurora:~$ aurora job killall devcluster/www-data/devel/hello_world
-     INFO] Killing tasks for job: devcluster/www-data/devel/hello_world
-     INFO] Instances to be killed: [0]
-    Successfully killed instances [0]
-    Job killall succeeded
-
-The job page now shows the `hello_world` tasks as completed.
-
-![Killed Task page](images/killedtask.png)
-
-## Next Steps
-
-Now that you've finished this Tutorial, you should read or do the following:
-
-- [The Aurora Configuration Tutorial](configuration-tutorial.md), which 
provides more examples
-  and best practices for writing Aurora configurations. You should also look at
-  the [Aurora + Thermos Configuration Reference](configuration-reference.md).
-- The [Aurora User Guide](user-guide.md) provides an overview of how Aurora, 
Mesos, and
-  Thermos work "under the hood".
-- Explore the Aurora Client - use `aurora -h`, and read the
-  [Aurora Client Commands](client-commands.md) document.

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/user-guide.md
----------------------------------------------------------------------
diff --git a/docs/user-guide.md b/docs/user-guide.md
deleted file mode 100644
index 656296c..0000000
--- a/docs/user-guide.md
+++ /dev/null
@@ -1,244 +0,0 @@
-Aurora User Guide
------------------
-
-- [Overview](#user-content-overview)
-- [Job Lifecycle](#user-content-job-lifecycle)
-       - [Task Updates](#user-content-task-updates)
-       - [HTTP Health Checking](#user-content-http-health-checking)
-- [Service Discovery](#user-content-service-discovery)
-- [Configuration](#user-content-configuration)
-- [Creating Jobs](#user-content-creating-jobs)
-- [Interacting With Jobs](#user-content-interacting-with-jobs)
-
-Overview
---------
-
-This document gives an overview of how Aurora works under the hood.
-It assumes you've already worked through the "hello world" example
-job in the [Aurora Tutorial](tutorial.md). Specifics of how to use Aurora are 
**not**
- given here, but pointers to documentation about how to use Aurora are
-provided.
-
-Aurora is a Mesos framework used to schedule *jobs* onto Mesos. Mesos
-cares about individual *tasks*, but typical jobs consist of dozens or
-hundreds of task replicas. Aurora provides a layer on top of Mesos with
-its `Job` abstraction. An Aurora `Job` consists of a task template and
-instructions for creating near-identical replicas of that task (modulo
-things like "instance id" or specific port numbers which may differ from
-machine to machine).
-
-How many tasks make up a Job is complicated. On a basic level, a Job consists 
of
-one task template and instructions for creating near-idential replicas of that 
task
-(otherwise referred to as "instances" or "shards").
-
-However, since Jobs can be updated on the fly, a single Job identifier or *job 
key*
-can have multiple job configurations associated with it.
-
-For example, consider when I have a Job with 4 instances that each
-request 1 core of cpu, 1 GB of RAM, and 1 GB of disk space as specified
-in the configuration file `hello_world.aurora`. I want to
-update it so it requests 2 GB of RAM instead of 1. I create a new
-configuration file to do that called `new_hello_world.aurora` and
-issue a `aurora update start <job_key_value>/0-1 new_hello_world.aurora`
-command.
-
-This results in instances 0 and 1 having 1 cpu, 2 GB of RAM, and 1 GB of disk 
space,
-while instances 2 and 3 have 1 cpu, 1 GB of RAM, and 1 GB of disk space. If 
instance 3
-dies and restarts, it restarts with 1 cpu, 1 GB RAM, and 1 GB disk space.
-
-So that means there are two simultaneous task configurations for the same Job
-at the same time, just valid for different ranges of instances.
-
-This isn't a recommended pattern, but it is valid and supported by the
-Aurora scheduler. This most often manifests in the "canary pattern" where
-instance 0 runs with a different configuration than instances 1-N to test
-different code versions alongside the actual production job.
-
-A task can merely be a single *process* corresponding to a single
-command line, such as `python2.6 my_script.py`. However, a task can also
-consist of many separate processes, which all run within a single
-sandbox. For example, running multiple cooperating agents together,
-such as `logrotate`, `installer`, master, or slave processes. This is
-where Thermos  comes in. While Aurora provides a `Job` abstraction on
-top of Mesos `Tasks`, Thermos provides a `Process` abstraction
-underneath Mesos `Task`s and serves as part of the Aurora framework's
-executor.
-
-You define `Job`s,` Task`s, and `Process`es in a configuration file.
-Configuration files are written in Python, and make use of the Pystachio
-templating language. They end in a `.aurora` extension.
-
-Pystachio is a type-checked dictionary templating library.
-
-> TL;DR
->
-> -   Aurora manages jobs made of tasks.
-> -   Mesos manages tasks made of processes.
-> -   Thermos manages processes.
-> -   All defined in `.aurora` configuration file.
-
-![Aurora hierarchy](images/aurora_hierarchy.png)
-
-Each `Task` has a *sandbox* created when the `Task` starts and garbage
-collected when it finishes. All of a `Task'`s processes run in its
-sandbox, so processes can share state by using a shared current working
-directory.
-
-The sandbox garbage collection policy considers many factors, most
-importantly age and size. It makes a best-effort attempt to keep
-sandboxes around as long as possible post-task in order for service
-owners to inspect data and logs, should the `Task` have completed
-abnormally. But you can't design your applications assuming sandboxes
-will be around forever, e.g. by building log saving or other
-checkpointing mechanisms directly into your application or into your
-`Job` description.
-
-
-Job Lifecycle
--------------
-
-`Job`s and their `Task`s have various states that are described in the [Task 
Lifecycle](task-lifecycle.md).
-However, in day to day use, you'll be primarily concerned with launching new 
jobs and updating existing ones.
-
-
-### Task Updates
-
-`Job` configurations can be updated at any point in their lifecycle.
-Usually updates are done incrementally using a process called a *rolling
-upgrade*, in which Tasks are upgraded in small groups, one group at a
-time.  Updates are done using various Aurora Client commands.
-
-For a configuration update, the Aurora Client calculates required changes
-by examining the current job config state and the new desired job config.
-It then starts a rolling batched update process by going through every batch
-and performing these operations:
-
-- If an instance is present in the scheduler but isn't in the new config,
-  then that instance is killed.
-- If an instance is not present in the scheduler but is present in
-  the new config, then the instance is created.
-- If an instance is present in both the scheduler and the new config, then
-  the client diffs both task configs. If it detects any changes, it
-  performs an instance update by killing the old config instance and adds
-  the new config instance.
-
-The Aurora client continues through the instance list until all tasks are
-updated, in `RUNNING,` and healthy for a configurable amount of time.
-If the client determines the update is not going well (a percentage of health
-checks have failed), it cancels the update.
-
-Update cancellation runs a procedure similar to the described above
-update sequence, but in reverse order. New instance configs are swapped
-with old instance configs and batch updates proceed backwards
-from the point where the update failed. E.g.; (0,1,2) (3,4,5) (6,7,
-8-FAIL) results in a rollback in order (8,7,6) (5,4,3) (2,1,0).
-
-### HTTP Health Checking
-
-The Executor implements a protocol for rudimentary control of a task via HTTP. 
 Tasks subscribe for
-this protocol by declaring a port named `health`.  Take for example this 
configuration snippet:
-
-    nginx = Process(
-      name = 'nginx',
-      cmdline = './run_nginx.sh -port {{thermos.ports[health]}}')
-
-When this Process is included in a job, the job will be allocated a port, and 
the command line
-will be replaced with something like:
-
-    ./run_nginx.sh -port 42816
-
-Where 42816 happens to be the allocated. port.  Typically, the Executor 
monitors Processes within
-a task only by liveness of the forked process.  However, when a `health` port 
was allocated, it will
-also send periodic HTTP health checks.  A task requesting a `health` port must 
handle the following
-requests:
-
-| HTTP request            | Description                             |
-| ------------            | -----------                             |
-| `GET /health`           | Inquires whether the task is healthy.   |
-
-Please see the
-[configuration 
reference](configuration-reference.md#user-content-healthcheckconfig-objects) 
for
-configuration options for this feature.
-
-#### Snoozing Health Checks
-
-If you need to pause your health check, you can do so by touching a file 
inside of your sandbox,
-named `.healthchecksnooze`
-
-As long as that file is present, health checks will be disabled, enabling 
users to gather core dumps
-or other performance measurements without worrying about Aurora's health check 
killing their
-process.
-
-WARNING: Remember to remove this when you are done, otherwise your instance 
will have permanently
-disabled health checks.
-
-
-Configuration
--------------
-
-You define and configure your Jobs (and their Tasks and Processes) in
-Aurora configuration files. Their filenames end with the `.aurora`
-suffix, and you write them in Python making use of the Pystachio
-templating language, along
-with specific Aurora, Mesos, and Thermos commands and methods. See the
-[Configuration Guide and Reference](configuration-reference.md) and
-[Configuration Tutorial](configuration-tutorial.md).
-
-Service Discovery
------------------
-
-It is possible for the Aurora executor to announce tasks into ServerSets for
-the purpose of service discovery.  ServerSets use the Zookeeper [group 
membership 
pattern](http://zookeeper.apache.org/doc/trunk/recipes.html#sc_outOfTheBox)
-of which there are several reference implementations:
-
-  - [C++](https://github.com/apache/mesos/blob/master/src/zookeeper/group.cpp)
-  - 
[Java](https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/zookeeper/ServerSetImpl.java#L221)
-  - 
[Python](https://github.com/twitter/commons/blob/master/src/python/twitter/common/zookeeper/serverset/serverset.py#L51)
-
-These can also be used natively in Finagle using the 
[ZookeeperServerSetCluster](https://github.com/twitter/finagle/blob/master/finagle-serversets/src/main/scala/com/twitter/finagle/zookeeper/ZookeeperServerSetCluster.scala).
-
-For more information about how to configure announcing, see the [Configuration 
Reference](configuration-reference.md).
-
-Creating Jobs
--------------
-
-You create and manipulate Aurora Jobs with the Aurora client, which starts all 
its
-command line commands with
-`aurora`. See [Aurora Client Commands](client-commands.md) for details
-about the Aurora Client.
-
-Interacting With Jobs
----------------------
-
-You interact with Aurora jobs either via:
-
-- Read-only Web UIs
-
-  Part of the output from creating a new Job is a URL for the Job's scheduler 
UI page.
-
-  For example:
-
-      vagrant@precise64:~$ aurora job create devcluster/www-data/prod/hello \
-      /vagrant/examples/jobs/hello_world.aurora
-      INFO] Creating job hello
-      INFO] Response from scheduler: OK (message: 1 new tasks pending for job 
www-data/prod/hello)
-      INFO] Job url: http://precise64:8081/scheduler/www-data/prod/hello
-
-  The "Job url" goes to the Job's scheduler UI page. To go to the overall 
scheduler UI page,
-  stop at the "scheduler" part of the URL, in this case, 
`http://precise64:8081/scheduler`
-
-  You can also reach the scheduler UI page via the Client command `aurora job 
open`:
-
-      aurora job open [<cluster>[/<role>[/<env>/<job_name>]]]
-
-  If only the cluster is specified, it goes directly to that cluster's 
scheduler main page.
-  If the role is specified, it goes to the top-level role page. If the full 
job key is specified,
-  it goes directly to the job page where you can inspect individual tasks.
-
-  Once you click through to a role page, you see Jobs arranged separately by 
pending jobs, active
-  jobs, and finished jobs. Jobs are arranged by role, typically a service 
account for production
-  jobs and user accounts for test or development jobs.
-
-- The Aurora client
-
-  See [client commands](client-commands.md).

http://git-wip-us.apache.org/repos/asf/aurora/blob/f28f41a7/docs/vagrant.md
----------------------------------------------------------------------
diff --git a/docs/vagrant.md b/docs/vagrant.md
deleted file mode 100644
index 3bc201f..0000000
--- a/docs/vagrant.md
+++ /dev/null
@@ -1,137 +0,0 @@
-Getting Started
-===============
-
-This document shows you how to configure a complete cluster using a virtual 
machine. This setup
-replicates a real cluster in your development machine as closely as possible. 
After you complete
-the steps outlined here, you will be ready to create and run your first Aurora 
job.
-
-The following sections describe these steps in detail:
-
-1. [Overview](#user-content-overview)
-1. [Install VirtualBox and 
Vagrant](#user-content-install-virtualbox-and-vagrant)
-1. [Clone the Aurora repository](#user-content-clone-the-aurora-repository)
-1. [Start the local cluster](#user-content-start-the-local-cluster)
-1. [Log onto the VM](#user-content-log-onto-the-vm)
-1. [Run your first job](#user-content-run-your-first-job)
-1. [Rebuild components](#user-content-rebuild-components)
-1. [Shut down or delete your local 
cluster](#user-content-shut-down-or-delete-your-local-cluster)
-1. [Troubleshooting](#user-content-troubleshooting)
-
-
-Overview
---------
-
-The Aurora distribution includes a set of scripts that enable you to create a 
local cluster in
-your development machine. These scripts use 
[Vagrant](https://www.vagrantup.com/) and
-[VirtualBox](https://www.virtualbox.org/) to run and configure a virtual 
machine. Once the
-virtual machine is running, the scripts install and initialize Aurora and any 
required components
-to create the local cluster.
-
-
-Install VirtualBox and Vagrant
-------------------------------
-
-First, download and install [VirtualBox](https://www.virtualbox.org/) on your 
development machine.
-
-Then download and install [Vagrant](https://www.vagrantup.com/). To verify 
that the installation
-was successful, open a terminal window and type the `vagrant` command. You 
should see a list of
-common commands for this tool.
-
-
-Clone the Aurora repository
----------------------------
-
-To obtain the Aurora source distribution, clone its Git repository using the 
following command:
-
-     git clone git://git.apache.org/aurora.git
-
-
-Start the local cluster
------------------------
-
-Now change into the `aurora/` directory, which contains the Aurora source code 
and
-other scripts and tools:
-
-     cd aurora/
-
-To start the local cluster, type the following command:
-
-     vagrant up
-
-This command uses the configuration scripts in the Aurora distribution to:
-
-* Download a Linux system image.
-* Start a virtual machine (VM) and configure it.
-* Install the required build tools on the VM.
-* Install Aurora's requirements (like [Mesos](http://mesos.apache.org/) and
-[Zookeeper](http://zookeeper.apache.org/)) on the VM.
-* Build and install Aurora from source on the VM.
-* Start Aurora's services on the VM.
-
-This process takes several minutes to complete.
-
-To verify that Aurora is running on the cluster, visit the following URLs:
-
-* Scheduler - http://192.168.33.7:8081
-* Observer - http://192.168.33.7:1338
-* Mesos Master - http://192.168.33.7:5050
-* Mesos Slave - http://192.168.33.7:5051
-
-
-Log onto the VM
----------------
-
-To SSH into the VM, run the following command in your development machine:
-
-     vagrant ssh
-
-To verify that Aurora is installed in the VM, type the `aurora` command. You 
should see a list
-of arguments and possible commands.
-
-The `/vagrant` directory on the VM is mapped to the `aurora/` local directory
-from which you started the cluster. You can edit files inside this directory 
in your development
-machine and access them from the VM under `/vagrant`.
-
-A pre-installed `clusters.json` file refers to your local cluster as 
`devcluster`, which you
-will use in client commands.
-
-
-Run your first job
-------------------
-
-Now that your cluster is up and running, you are ready to define and run your 
first job in Aurora.
-For more information, see the [Aurora Tutorial](tutorial.md).
-
-
-Rebuild components
-------------------
-
-If you are changing Aurora code and would like to rebuild a component, you can 
use the `aurorabuild`
-command on the VM to build and restart a component.  This is considerably 
faster than destroying
-and rebuilding your VM.
-
-`aurorabuild` accepts a list of components to build and update. To get a list 
of supported
-components, invoke the `aurorabuild` command with no arguments:
-
-     vagrant ssh -c 'aurorabuild client'
-
-
-Shut down or delete your local cluster
---------------------------------------
-
-To shut down your local cluster, run the `vagrant halt` command in your 
development machine. To
-start it again, run the `vagrant up` command.
-
-Once you are finished with your local cluster, or if you would otherwise like 
to start from scratch,
-you can use the command `vagrant destroy` to turn off and delete the virtual 
file system.
-
-
-Troubleshooting
----------------
-
-Most of the vagrant related problems can be fixed by the following steps:
-
-* Destroying the vagrant environment with `vagrant destroy`
-* Killing any orphaned VMs (see AURORA-499) with `virtualbox` UI or 
`VBoxManage` command line tool
-* Cleaning the repository of build artifacts and other intermediate output 
with `git clean -fdx`
-* Bringing up the vagrant environment with `vagrant up`

[1/7] aurora git commit: Reorganize Documentation

Reply via email to