Author: jcohen
Date: Mon Oct 26 18:57:24 2015
New Revision: 1710677
URL: http://svn.apache.org/viewvc?rev=1710677&view=rev
Log:
Fix the Twitter handle on community page.
Modified:
aurora/site/publish/community/index.html
aurora/site/publish/documentation/latest/client-commands/index.html
aurora/site/publish/documentation/latest/configuration-reference/index.html
aurora/site/publish/documentation/latest/configuration-tutorial/index.html
aurora/site/publish/documentation/latest/cron-jobs/index.html
aurora/site/publish/documentation/latest/deploying-aurora-scheduler/index.html
aurora/site/publish/documentation/latest/developing-aurora-client/index.html
aurora/site/publish/documentation/latest/developing-aurora-scheduler/index.html
aurora/site/publish/documentation/latest/index.html
aurora/site/publish/documentation/latest/monitoring/index.html
aurora/site/publish/documentation/latest/sla/index.html
aurora/site/publish/documentation/latest/storage-config/index.html
aurora/site/publish/documentation/latest/test-resource-generation/index.html
aurora/site/publish/documentation/latest/vagrant/index.html
aurora/site/publish/sitemap.xml
aurora/site/source/community.html.md
aurora/site/source/documentation/latest.html.md
aurora/site/source/documentation/latest/client-commands.md
aurora/site/source/documentation/latest/configuration-reference.md
aurora/site/source/documentation/latest/configuration-tutorial.md
aurora/site/source/documentation/latest/cron-jobs.md
aurora/site/source/documentation/latest/deploying-aurora-scheduler.md
aurora/site/source/documentation/latest/developing-aurora-client.md
aurora/site/source/documentation/latest/developing-aurora-scheduler.md
aurora/site/source/documentation/latest/monitoring.md
aurora/site/source/documentation/latest/sla.md
aurora/site/source/documentation/latest/storage-config.md
aurora/site/source/documentation/latest/test-resource-generation.md
aurora/site/source/documentation/latest/vagrant.md
Modified: aurora/site/publish/community/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/community/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/community/index.html (original)
+++ aurora/site/publish/community/index.html Mon Oct 26 18:57:24 2015
@@ -73,7 +73,7 @@
</div>
<div class="col-md-4">
<h3>Follow the Project</h3>
- <a class="twitter-timeline" href="https://twitter.com/ApacheAurora"
data-widget-id="512693636127920129">Tweets by @ApacheMesos</a>
+ <a class="twitter-timeline" href="https://twitter.com/ApacheAurora"
data-widget-id="512693636127920129">Tweets by @ApacheAurora</a>
<script>!function(d,s,id){var
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
</div>
</div>
Modified: aurora/site/publish/documentation/latest/client-commands/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/client-commands/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/client-commands/index.html
(original)
+++ aurora/site/publish/documentation/latest/client-commands/index.html Mon Oct
26 18:57:24 2015
@@ -360,8 +360,7 @@ configuration file, and displays the par
<pre class="highlight text">aurora quota get CLUSTER/ROLE
</pre>
<p>Prints the production quota allocated to the role’s value at the given
-cluster. Only non-<a
href="deploying-aurora-scheduler.md#dedicated-attribute">dedicated</a>
-<a href="configuration-reference.md#job-objects">production</a> jobs consume
quota.</p>
+cluster.</p>
<h3 id="finding-a-job-on-web-ui">Finding a Job on Web UI</h3>
Modified:
aurora/site/publish/documentation/latest/configuration-reference/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/configuration-reference/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/configuration-reference/index.html
(original)
+++ aurora/site/publish/documentation/latest/configuration-reference/index.html
Mon Oct 26 18:57:24 2015
@@ -81,12 +81,10 @@
<ul>
<li><a href="#job-objects">Job Objects</a></li>
<li><a href="#services">Services</a></li>
-<li><a href="#revocable-jobs">Revocable Jobs</a></li>
<li><a href="#updateconfig-objects">UpdateConfig Objects</a></li>
<li><a href="#healthcheckconfig-objects">HealthCheckConfig Objects</a></li>
<li><a href="#announcer-objects">Announcer Objects</a></li>
<li><a href="#container">Container Objects</a></li>
-<li><a href="#lifecycleconfig-objects">LifecycleConfig Objects</a></li>
</ul></li>
<li><a href="#specifying-scheduling-constraints">Specifying Scheduling
Constraints</a></li>
<li><a href="#template-namespaces">Template Namespaces</a>
@@ -430,7 +428,7 @@ ordering constraints.</p>
<h3 id="resource-object">Resource Object</h3>
<p>Specifies the amount of CPU, Ram, and disk resources the task needs. See the
-<a href="/documentation/latest/resources/">Resource Isolation document</a> for
suggested values and to understand how
+<a href="/documentation/latest/resource-isolation/">Resource Isolation
document</a> for suggested values and to understand how
resources are allocated.</p>
<table><thead>
@@ -541,7 +539,7 @@ resources are allocated.</p>
<tr>
<td><code>production</code></td>
<td style="text-align: center">Boolean</td>
-<td>Whether or not this is a production task that may <a
href="resources.md#task-preemption">preempt</a> other tasks (Default: False).
Production job role must have the appropriate <a
href="resources.md#resource-quota">quota</a>.</td>
+<td>Whether or not this is a production task backed by quota (Default: False).
Production jobs may preempt any non-production job, and may only be preempted
by production jobs in the same role and of higher priority. To run jobs at this
level, the job role must have the appropriate quota. To grant quota to a
particular role in production, operators use the <code>aurora_admin
set_quota</code> command.</td>
</tr>
<tr>
<td><code>health_check_config</code></td>
@@ -553,16 +551,6 @@ resources are allocated.</p>
<td style="text-align: center"><code>Container</code> object</td>
<td>An optional container to run all processes inside of.</td>
</tr>
-<tr>
-<td><code>lifecycle</code></td>
-<td style="text-align: center"><code>LifecycleConfig</code> object</td>
-<td>An optional task lifecycle configuration that dictates commands to be
executed on startup/teardown. HTTP lifecycle is enabled by default if the
“health” port is requested. See <a
href="#lifecycleconfig-objects">LifecycleConfig Objects</a> for more
information.</td>
-</tr>
-<tr>
-<td><code>tier</code></td>
-<td style="text-align: center">String</td>
-<td>Task tier type. When set to <code>revocable</code> requires the task to
run with Mesos revocable resources. This is work <a
href="https://issues.apache.org/jira/browse/AURORA-1343">in progress</a> and is
currently only supported for the revocable tasks. The ultimate goal is to
simplify task configuration by hiding various configuration knobs behind a task
tier definition. See AURORA-1343 and AURORA-1443 for more details.</td>
-</tr>
</tbody></table>
<h3 id="services">Services</h3>
@@ -575,21 +563,6 @@ Jobs without the service bit set only re
<code>max_task_failures</code> times and only if they terminated unsuccessfully
either due to human error or machine failure.</p>
-<h3 id="revocable-jobs">Revocable Jobs</h3>
-
-<p><strong>WARNING</strong>: This feature is currently in alpha status. Do not
use it in production clusters!</p>
-
-<p>Mesos <a
href="http://mesos.apache.org/documentation/latest/oversubscription/">supports
a concept of revocable tasks</a>
-by oversubscribing machine resources by the amount deemed safe to not affect
the existing
-non-revocable tasks. Aurora now supports revocable jobs via a
<code>tier</code> setting set to <code>revocable</code>
-value.</p>
-
-<p>More implementation details in this <a
href="https://issues.apache.org/jira/browse/AURORA-1343">ticket</a>.</p>
-
-<p>Scheduler must be <a
href="deploying-aurora-scheduler.md#configuring-resource-oversubscription">configured</a>
-to receive revocable offers from Mesos and accept revocable jobs. If not
configured properly
-revocable tasks will never get assigned to hosts and will stay in PENDING.</p>
-
<h3 id="updateconfig-objects">UpdateConfig Objects</h3>
<p>Parameters for controlling the rate and policy of rolling updates.</p>
@@ -665,29 +638,14 @@ revocable tasks will never get assigned
<td>Interval on which to check the task’s health via HTTP. (Default:
10)</td>
</tr>
<tr>
-<td><code>max_consecutive_failures</code></td>
-<td style="text-align: center">Integer</td>
-<td>Maximum number of consecutive failures that tolerated before considering a
task unhealthy (Default: 0)</td>
-</tr>
-<tr>
<td><code>timeout_secs</code></td>
<td style="text-align: center">Integer</td>
<td>HTTP request timeout. (Default: 1)</td>
</tr>
<tr>
-<td><code>endpoint</code></td>
-<td style="text-align: center">String</td>
-<td>HTTP endpoint to check (Default: /health)</td>
-</tr>
-<tr>
-<td><code>expected_response</code></td>
-<td style="text-align: center">String</td>
-<td>If not empty, fail the health check if the response differs. Case
insensitive. (Default: ok)</td>
-</tr>
-<tr>
-<td><code>expected_response_code</code></td>
+<td><code>max_consecutive_failures</code></td>
<td style="text-align: center">Integer</td>
-<td>If not zero, fail the health check if the response code differs. (Default:
0)</td>
+<td>Maximum number of consecutive failures that tolerated before considering a
task unhealthy (Default: 0)</td>
</tr>
</tbody></table>
@@ -776,96 +734,8 @@ guarantees should they be needed.</p>
<td style="text-align: center">String</td>
<td>The name of the docker image to execute. If the image does not exist
locally it will be pulled with <code>docker pull</code>.</td>
</tr>
-<tr>
-<td><code>parameters</code></td>
-<td style="text-align: center">List(Parameter)</td>
-<td>Additional parameters to pass to the docker containerizer.</td>
-</tr>
</tbody></table>
-<h3 id="docker-parameter-object">Docker Parameter Object</h3>
-
-<p>Docker CLI parameters. This needs to be enabled by the scheduler
<code>enable_docker_parameters</code> option.
-See <a href="https://docs.docker.com/reference/commandline/run/">Docker
Command Line Reference</a> for valid parameters. </p>
-
-<table><thead>
-<tr>
-<th>param</th>
-<th style="text-align: center">type</th>
-<th>description</th>
-</tr>
-</thead><tbody>
-<tr>
-<td><code>name</code></td>
-<td style="text-align: center">String</td>
-<td>The name of the docker parameter. E.g. volume</td>
-</tr>
-<tr>
-<td><code>value</code></td>
-<td style="text-align: center">String</td>
-<td>The value of the parameter. E.g. /usr/local/bin:/usr/bin:rw</td>
-</tr>
-</tbody></table>
-
-<h3 id="lifecycleconfig-objects">LifecycleConfig Objects</h3>
-
-<p><em>Note: The only lifecycle configuration supported is the HTTP lifecycle
via the HTTPLifecycleConfig.</em></p>
-
-<table><thead>
-<tr>
-<th>param</th>
-<th style="text-align: center">type</th>
-<th>description</th>
-</tr>
-</thead><tbody>
-<tr>
-<td><code>http</code></td>
-<td style="text-align: center">HTTPLifecycleConfig</td>
-<td>Configure the lifecycle manager to send lifecycle commands to the task via
HTTP.</td>
-</tr>
-</tbody></table>
-
-<h3 id="httplifecycleconfig-objects">HTTPLifecycleConfig Objects</h3>
-
-<table><thead>
-<tr>
-<th>param</th>
-<th style="text-align: center">type</th>
-<th>description</th>
-</tr>
-</thead><tbody>
-<tr>
-<td><code>port</code></td>
-<td style="text-align: center">String</td>
-<td>The named port to send POST commands (Default: health)</td>
-</tr>
-<tr>
-<td><code>graceful_shutdown_endpoint</code></td>
-<td style="text-align: center">String</td>
-<td>Endpoint to hit to indicate that a task should gracefully shutdown.
(Default: /quitquitquit)</td>
-</tr>
-<tr>
-<td><code>shutdown_endpoint</code></td>
-<td style="text-align: center">String</td>
-<td>Endpoint to hit to give a task its final warning before being killed.
(Default: /abortabortabort)</td>
-</tr>
-</tbody></table>
-
-<h4 id="gracefulshutdownendpoint">graceful<em>shutdown</em>endpoint</h4>
-
-<p>If the Job is listening on the port as specified by the HTTPLifecycleConfig
-(default: <code>health</code>), a HTTP POST request will be sent over
localhost to this
-endpoint to request that the task gracefully shut itself down. This is a
-courtesy call before the <code>shutdown_endpoint</code> is invoked a fixed
amount of
-time later.</p>
-
-<h4 id="shutdown_endpoint">shutdown_endpoint</h4>
-
-<p>If the Job is listening on the port as specified by the HTTPLifecycleConfig
-(default: <code>health</code>), a HTTP POST request will be sent over
localhost to this
-endpoint to request as a final warning before being shut down. If the task
-does not shut down on its own after this, it will be forcefully killed</p>
-
<h1 id="specifying-scheduling-constraints">Specifying Scheduling
Constraints</h1>
<p>Most users will not need to specify constraints explicitly, as the
Modified:
aurora/site/publish/documentation/latest/configuration-tutorial/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/configuration-tutorial/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/configuration-tutorial/index.html
(original)
+++ aurora/site/publish/documentation/latest/configuration-tutorial/index.html
Mon Oct 26 18:57:24 2015
@@ -647,7 +647,12 @@ instances/replicas/shards of the Job&rsq
for which higher values may preempt Tasks from Jobs with lower
values.</p></li>
<li><p><code>production</code>: a Boolean, defaulting to <code>False</code>,
specifying that this
-is a <a href="configuration-reference.md#job-objects">production</a>
job.</p></li>
+is a production job backed by quota. Tasks from production Jobs may
+preempt tasks from any non-production job, and may only be preempted
+by tasks from production jobs in the same role with higher
+priority. <strong>WARNING</strong>: To run Jobs at this level, the Job role
must
+have the appropriate quota. To grant quota to a particular role in
+production, operators use the <code>aurora_admin set_quota</code>
command.</p></li>
</ul>
<p>The final three Job attributes each take an object as their value.</p>
Modified: aurora/site/publish/documentation/latest/cron-jobs/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/cron-jobs/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/cron-jobs/index.html (original)
+++ aurora/site/publish/documentation/latest/cron-jobs/index.html Mon Oct 26
18:57:24 2015
@@ -118,7 +118,7 @@ grow faster than they can process it.</p
<p>Unlike with services, which aurora will always re-execute regardless of
exit status, instances of
cron jobs retry according to the <code>max_task_failures</code> attribute of
the
-<a href="configuration-reference.md#task-objects">Task</a> object. To get
“run-until-success” semantics,
+<a href="configuration-reference.md#task-objects">Task</a> object. To get
“run-until-failure” semantics,
set <code>max_task_failures</code> to <code>-1</code>.</p>
<h2 id="interacting-with-cron-jobs-via-the-aurora-cli">Interacting with cron
jobs via the Aurora CLI</h2>
Modified:
aurora/site/publish/documentation/latest/deploying-aurora-scheduler/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/deploying-aurora-scheduler/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
---
aurora/site/publish/documentation/latest/deploying-aurora-scheduler/index.html
(original)
+++
aurora/site/publish/documentation/latest/deploying-aurora-scheduler/index.html
Mon Oct 26 18:57:24 2015
@@ -60,8 +60,6 @@ machines. This guide helps you get the
<li><a href="#storage-performance-considerations">Storage Performance
Considerations</a></li>
<li><a href="#network-considerations">Network considerations</a></li>
<li><a href="#considerations-for-running-jobs-in-docker">Considerations for
running jobs in docker</a></li>
-<li><a href="#security-considerations">Security Considerations</a></li>
-<li><a href="#configuring-resource-oversubscription">Configuring Resource
Oversubscription</a></li>
</ul></li>
<li><a href="#running-aurora">Running Aurora</a>
@@ -76,11 +74,6 @@ machines. This guide helps you get the
<li><a href="#example">Example</a></li>
</ul></li>
</ul></li>
-<li><a href="#best-practices">Best practices</a>
-
-<ul>
-<li><a href="#diversity">Diversity</a></li>
-</ul></li>
<li><a href="#common-problems">Common problems</a>
<ul>
@@ -90,12 +83,9 @@ machines. This guide helps you get the
<li><a href="#scheduler-not-registered">Scheduler not registered</a></li>
<li><a href="#symptoms-1">Symptoms</a></li>
<li><a href="#solution-1">Solution</a></li>
-</ul></li>
-<li><a href="#changing-scheduler-quorum-size">Changing Scheduler Quorum
Size</a>
-
-<ul>
-<li><a href="#preparation">Preparation</a></li>
-<li><a href="#adding-new-schedulers">Adding New Schedulers</a></li>
+<li><a href="#tasks-are-stuck-in-pending-forever">Tasks are stuck in PENDING
forever</a></li>
+<li><a href="#symptoms-2">Symptoms</a></li>
+<li><a href="#solution-2">Solution</a></li>
</ul></li>
</ul>
@@ -103,7 +93,7 @@ machines. This guide helps you get the
<p>The Aurora scheduler is a standalone Java server. As part of the build
process it creates a bundle
of all its dependencies, with the notable exceptions of the JVM and libmesos.
Each target server
-should have a JVM (Java 7 or higher) and libmesos (0.23.0) installed.</p>
+should have a JVM (Java 7 or higher) and libmesos (0.21.1) installed.</p>
<h3 id="creating-the-distribution-zip-file-optional-">Creating the
Distribution .zip File (Optional)</h3>
@@ -269,23 +259,6 @@ restarted.</p>
</pre>
<p>assuming you set <code>-http_port=8081</code>.</p>
-<h2 id="security-considerations">Security Considerations</h2>
-
-<p>See <a href="/documentation/latest/security/">security.md</a>.</p>
-
-<h2 id="configuring-resource-oversubscription">Configuring Resource
Oversubscription</h2>
-
-<p><strong>WARNING</strong>: This feature is currently in alpha status. Do not
use it in production clusters!
-See <a href="configuration-reference.md#revocable-jobs">this document</a> for
more feature details.</p>
-
-<p>Set these scheduler flag to allow receiving revocable Mesos offers:</p>
-<pre class="highlight text">-receive_revocable_resources=true
-</pre>
-<p>Specify a tier configuration file path:</p>
-<pre class="highlight text">-tier_config=path/to/tiers/config.json
-</pre>
-<p>Example <a
href="../src/test/resources/org/apache/aurora/scheduler/tiers-example.json">tier
configuration file</a>.</p>
-
<h3 id="maintaining-an-aurora-installation">Maintaining an Aurora
Installation</h3>
<h3 id="monitoring">Monitoring</h3>
@@ -308,9 +281,6 @@ constraints are arbitrary and available
<code>dedicated</code> attribute. Aurora treats this specially, and only
allows matching jobs to run on these
machines, and will only schedule matching jobs on these machines.</p>
-<p>See the <a href="resources.md#resource-quota">section</a> about resource
quotas to learn how quotas apply to
-dedicated jobs.</p>
-
<h5 id="syntax">Syntax</h5>
<p>The dedicated attribute has semantic meaning. The format is
<code>$role(/.*)?</code>. When a job is created,
@@ -322,7 +292,7 @@ enforce this.</p>
<h5 id="example">Example</h5>
<p>Consider the following slave command line:</p>
-<pre class="highlight text">mesos-slave
--attributes="dedicated:db_team/redis" ...
+<pre class="highlight text">mesos-slave
--attributes="host:$HOST;rack:$RACK;dedicated:db_team/redis" ...
</pre>
<p>And this job configuration:</p>
<pre class="highlight text">Service(
@@ -338,22 +308,6 @@ enforce this.</p>
<code>dedicated:db_team/redis</code>. Additionally, Aurora will prevent any
tasks that do <em>not</em> have that
constraint from running on those slaves.</p>
-<h2 id="best-practices">Best practices</h2>
-
-<h3 id="diversity">Diversity</h3>
-
-<p>Data centers are often organized with hierarchical failure domains. Common
failure domains
-include hosts, racks, rows, and PDUs. If you have this information available,
it is wise to tag
-the mesos-slave with them as
-<a
href="https://mesos.apache.org/documentation/attributes-resources/">attributes</a>.</p>
-
-<p>When it comes time to schedule jobs, Aurora will automatically spread them
across the failure
-domains as specified in the
-<a href="configuration-reference.md#specifying-scheduling-constraints">job
configuration</a>.</p>
-
-<p>Note: in virtualized environments like EC2, the only attribute that usually
makes sense for this
-purpose is <code>host</code>.</p>
-
<h2 id="common-problems">Common problems</h2>
<p>So you’ve started your first cluster and are running into some
issues? We’ve collected some common
@@ -395,24 +349,20 @@ the master in ZooKeeper, make sure comma
<p>is the same as the one on the scheduler:</p>
<pre class="highlight
text">-mesos_master_address=zk://$ZK_HOST:2181/mesos/master
</pre>
-<h2 id="changing-scheduler-quorum-size">Changing Scheduler Quorum Size</h2>
+<h3 id="tasks-are-stuck-in-pending-forever">Tasks are stuck in
<code>PENDING</code> forever</h3>
-<p>Special care needs to be taken when changing the size of the Aurora
scheduler quorum.
-Since Aurora uses a Mesos replicated log, similar steps need to be followed as
when
-<a
href="http://mesos.apache.org/documentation/latest/operational-guide">changing
the mesos quorum size</a>.</p>
+<h4 id="symptoms">Symptoms</h4>
-<h3 id="preparation">Preparation</h3>
+<p>The scheduler is registered, and (receiving
offers](docs/monitoring.md#scheduler<em>resource</em>offers),
+but tasks are perpetually shown as <code>PENDING - Constraint not satisfied:
host</code>.</p>
-<p>Increase <a
href="storage-config.md#-native_log_quorum_size">-native<em>log</em>quorum_size</a>
on each
-existing scheduler and restart them. When updating from 3 to 5 schedulers, the
quorum size
-would grow from 2 to 3.</p>
+<h4 id="solution">Solution</h4>
-<h3 id="adding-new-schedulers">Adding New Schedulers</h3>
+<p>Check that your slaves are configured with <code>host</code> and
<code>rack</code> attributes. Aurora requires that
+slaves are tagged with these two common failure domains to ensure that it can
safely place tasks
+such that jobs are resilient to failure.</p>
-<p>Start the new schedulers with <code>-native_log_quorum_size</code> set to
the new value. Failing to
-first increase the quorum size on running schedulers can in some cases result
in corruption
-or truncating of the replicated log used by Aurora. In that case, see the
documentation on
-<a href="storage-config.md#recovering-from-a-scheduler-backup">recovering from
backup</a>.</p>
+<p>See our <a href="examples/vagrant/upstart/mesos-slave.conf">vagrant
example</a> for details.</p>
</div>
</div>
Modified:
aurora/site/publish/documentation/latest/developing-aurora-client/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/developing-aurora-client/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
---
aurora/site/publish/documentation/latest/developing-aurora-client/index.html
(original)
+++
aurora/site/publish/documentation/latest/developing-aurora-client/index.html
Mon Oct 26 18:57:24 2015
@@ -69,7 +69,7 @@ To start a virtual cluster, you need to
the aurora workspace. This will create a vagrant host named
“devcluster”, with a mesos master, a set
of mesos slaves, and an aurora scheduler.</p>
-<p>If you have a change you would like to test in your local cluster,
you’ll rebuild the client:</p>
+<p>If you have changed you would like to test in your local cluster,
you’ll rebuild the client:</p>
<pre class="highlight text">vagrant ssh -c 'aurorabuild client'
</pre>
<p>Once this completes, the <code>aurora</code> command will reflect your
changes.</p>
Modified:
aurora/site/publish/documentation/latest/developing-aurora-scheduler/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/developing-aurora-scheduler/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
---
aurora/site/publish/documentation/latest/developing-aurora-scheduler/index.html
(original)
+++
aurora/site/publish/documentation/latest/developing-aurora-scheduler/index.html
Mon Oct 26 18:57:24 2015
@@ -118,25 +118,6 @@ bower remove <library name>
bower update <library name>
bower help
</pre>
-<h2 id="faster-iteration-in-vagrant">Faster Iteration in Vagrant</h2>
-
-<p>The scheduler serves UI assets from the classpath. For production
deployments this means the assets
-are served from within a jar. However, for faster development iteration, the
vagrant image is
-configured to add <code>/vagrant/dist/resources/main</code> to the head of
CLASSPATH. This path is configured
-as a shared filesystem to the path on the host system where your Aurora
repository lives. This means
-that any updates to dist/resources/main in your checkout will be reflected
immediately in the UI
-served from within the vagrant image.</p>
-
-<p>The one caveat to this is that this path is under <code>dist</code> not
<code>src</code>. This is because the assets must
-be processed by gradle before they can be served. So, unfortunately, you
cannot just save your local
-changes and see them reflected in the UI, you must first run <code>./gradlew
processResources</code>. This is
-less than ideal, but better than having to restart the scheduler after every
change. Additionally,
-gradle makes this process somewhat easier with the use of the
<code>--continuous</code> flag. If you run:
-<code>./gradlew processResources --continuous</code> gradle will monitor the
filesystem for changes and run the
-task automatically as necessary. This doesn’t quite provide hot-reload
capabilities, but it does
-allow for <5s from save to changes being visibile in the UI with no further
action required on the
-part of the developer.</p>
-
<h1 id="developing-the-aurora-build-system">Developing the Aurora Build
System</h1>
<h2 id="bootstrapping-gradle">Bootstrapping Gradle</h2>
Modified: aurora/site/publish/documentation/latest/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/index.html (original)
+++ aurora/site/publish/documentation/latest/index.html Mon Oct 26 18:57:24 2015
@@ -72,24 +72,17 @@
<li><a href="/documentation/latest/storage/">Scheduler Storage</a></li>
<li><a href="/documentation/latest/storage-config/">Scheduler Storage and
Maintenance</a></li>
<li><a href="/documentation/latest/sla/">SLA Measurement</a></li>
-<li><a href="/documentation/latest/resources/">Resource Isolation and
Sizing</a></li>
+<li><a href="/documentation/latest/resource-isolation/">Resource Isolation and
Sizing</a></li>
<li><a href="/documentation/latest/test-resource-generation/">Generating test
resources</a></li>
</ul>
<h2 id="developers">Developers</h2>
<ul>
-<li><a href="../CONTRIBUTING.md">Contributing to the project</a></li>
+<li><a href="/documentation/latest/contributing/">Contributing to the
project</a></li>
<li><a href="/documentation/latest/developing-aurora-scheduler/">Developing
the Aurora Scheduler</a></li>
<li><a href="/documentation/latest/developing-aurora-client/">Developing the
Aurora Client</a></li>
<li><a href="/documentation/latest/committers/">Committers Guide</a></li>
-<li><a href="/documentation/latest/build-system/">Build System</a></li>
-</ul>
-
-<h2 id="additional-resources">Additional Resources</h2>
-
-<ul>
-<li><a href="/documentation/latest/presentations/">Presentation videos and
slides</a></li>
</ul>
</div>
Modified: aurora/site/publish/documentation/latest/monitoring/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/monitoring/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/monitoring/index.html (original)
+++ aurora/site/publish/documentation/latest/monitoring/index.html Mon Oct 26
18:57:24 2015
@@ -118,119 +118,171 @@ recommend you start with a strict value
adjust thresholds as you see fit. Feel free to ask us if you would like to
validate that your alerts
and thresholds make sense.</p>
-<h2 id="important-stats">Important stats</h2>
-
-<h3 id="code-code"><code>jvm_uptime_secs</code></h3>
+<h4 id="code-code"><code>jvm_uptime_secs</code></h4>
<p>Type: integer counter</p>
+<h4 id="description">Description</h4>
+
<p>The number of seconds the JVM process has been running. Comes from
<a
href="http://docs.oracle.com/javase/7/docs/api/java/lang/management/RuntimeMXBean.html#getUptime()">RuntimeMXBean#getUptime()</a></p>
+<h4 id="alerting">Alerting</h4>
+
<p>Detecting resets (decreasing values) on this stat will tell you that the
scheduler is failing to
stay alive.</p>
+<h4 id="triage">Triage</h4>
+
<p>Look at the scheduler logs to identify the reason the scheduler is
exiting.</p>
-<h3 id="code-code"><code>system_load_avg</code></h3>
+<h4 id="code-code"><code>system_load_avg</code></h4>
<p>Type: double gauge</p>
+<h4 id="description">Description</h4>
+
<p>The current load average of the system for the last minute. Comes from
<a
href="http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html?is-external=true#getSystemLoadAverage()">OperatingSystemMXBean#getSystemLoadAverage()</a>.</p>
+<h4 id="alerting">Alerting</h4>
+
<p>A high sustained value suggests that the scheduler machine may be
over-utilized.</p>
+<h4 id="triage">Triage</h4>
+
<p>Use standard unix tools like <code>top</code> and <code>ps</code> to track
down the offending process(es).</p>
-<h3 id="code-code"><code>process_cpu_cores_utilized</code></h3>
+<h4 id="code-code"><code>process_cpu_cores_utilized</code></h4>
<p>Type: double gauge</p>
+<h4 id="description">Description</h4>
+
<p>The current number of CPU cores in use by the JVM process. This should not
exceed the number of
logical CPU cores on the machine. Derived from
<a
href="http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html">OperatingSystemMXBean#getProcessCpuTime()</a></p>
+<h4 id="alerting">Alerting</h4>
+
<p>A high sustained value indicates that the scheduler is overworked. Due to
current internal design
limitations, if this value is sustained at <code>1</code>, there is a good
chance the scheduler is under water.</p>
+<h4 id="triage">Triage</h4>
+
<p>There are two main inputs that tend to drive this figure: task scheduling
attempts and status
updates from Mesos. You may see activity in the scheduler logs to give an
indication of where
time is being spent. Beyond that, it really takes good familiarity with the
code to effectively
triage this. We suggest engaging with an Aurora developer.</p>
-<h3 id="code-code"><code>task_store_LOST</code></h3>
+<h4 id="code-code"><code>task_store_LOST</code></h4>
<p>Type: integer gauge</p>
+<h4 id="description">Description</h4>
+
<p>The number of tasks stored in the scheduler that are in the
<code>LOST</code> state, and have been rescheduled.</p>
+<h4 id="alerting">Alerting</h4>
+
<p>If this value is increasing at a high rate, it is a sign of trouble.</p>
+<h4 id="triage">Triage</h4>
+
<p>There are many sources of <code>LOST</code> tasks in Mesos: the scheduler,
master, slave, and executor can all
trigger this. The first step is to look in the scheduler logs for
<code>LOST</code> to identify where the
state changes are originating.</p>
-<h3 id="code-code"><code>scheduler_resource_offers</code></h3>
+<h4 id="code-code"><code>scheduler_resource_offers</code></h4>
<p>Type: integer counter</p>
+<h4 id="description">Description</h4>
+
<p>The number of resource offers that the scheduler has received.</p>
+<h4 id="alerting">Alerting</h4>
+
<p>For a healthy scheduler, this value must be increasing over time.</p>
+<h5 id="triage">Triage</h5>
+
<p>Assuming the scheduler is up and otherwise healthy, you will want to check
if the master thinks it
is sending offers. You should also look at the master’s web interface to
see if it has a large
number of outstanding offers that it is waiting to be returned.</p>
-<h3 id="code-code"><code>framework_registered</code></h3>
+<h4 id="code-code"><code>framework_registered</code></h4>
<p>Type: binary integer counter</p>
+<h4 id="description">Description</h4>
+
<p>Will be <code>1</code> for the leading scheduler that is registered with
the Mesos master, <code>0</code> for passive
schedulers,</p>
+<h4 id="alerting">Alerting</h4>
+
<p>A sustained period without a <code>1</code> (or where <code>sum() !=
1</code>) warrants investigation.</p>
+<h4 id="triage">Triage</h4>
+
<p>If there is no leading scheduler, look in the scheduler and master logs for
why. If there are
multiple schedulers claiming leadership, this suggests a split brain and
warrants filing a critical
bug.</p>
-<h3
id="code-code"><code>rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)</code></h3>
+<h4
id="code-code"><code>rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)</code></h4>
<p>Type: rate ratio of integer counters</p>
+<h4 id="description">Description</h4>
+
<p>This composes two counters to compute a windowed figure for the latency of
replicated log writes.</p>
+<h4 id="alerting">Alerting</h4>
+
<p>A hike in this value suggests disk bandwidth contention.</p>
+<h4 id="triage">Triage</h4>
+
<p>Look in scheduler logs for any reported oddness with saving to the
replicated log. Also use
standard tools like <code>vmstat</code> and <code>iotop</code> to identify
whether the disk has become slow or
over-utilized. We suggest using a dedicated disk for the replicated log to
mitigate this.</p>
-<h3 id="code-code"><code>timed_out_tasks</code></h3>
+<h4 id="code-code"><code>timed_out_tasks</code></h4>
<p>Type: integer counter</p>
+<h4 id="description">Description</h4>
+
<p>Tracks the number of times the scheduler has given up while waiting
(for <code>-transient_task_state_timeout</code>) to hear back about a task
that is in a transient state
(e.g. <code>ASSIGNED</code>, <code>KILLING</code>), and has moved to
<code>LOST</code> before rescheduling.</p>
+<h4 id="alerting">Alerting</h4>
+
<p>This value is currently known to increase occasionally when the scheduler
fails over
(<a href="https://issues.apache.org/jira/browse/AURORA-740">AURORA-740</a>).
However, any large spike in this
value warrants investigation.</p>
+<h4 id="triage">Triage</h4>
+
<p>The scheduler will log when it times out a task. You should trace the task
ID of the timed out
task into the master, slave, and/or executors to determine where the message
was dropped.</p>
-<h3 id="code-code"><code>http_500_responses_events</code></h3>
+<h4 id="code-code"><code>http_500_responses_events</code></h4>
<p>Type: integer counter</p>
+<h4 id="description">Description</h4>
+
<p>The total number of HTTP 500 status responses sent by the scheduler.
Includes API and asset serving.</p>
+<h4 id="alerting">Alerting</h4>
+
<p>An increase warrants investigation.</p>
+<h4 id="triage">Triage</h4>
+
<p>Look in scheduler logs to identify why the scheduler returned a 500, there
should be a stack trace.</p>
</div>
Modified: aurora/site/publish/documentation/latest/sla/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/sla/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/sla/index.html (original)
+++ aurora/site/publish/documentation/latest/sla/index.html Mon Oct 26 18:57:24
2015
@@ -60,9 +60,8 @@
Agreements) metrics that defining a contractual relationship between the
Aurora/Mesos platform
and hosted services.</p>
-<p>The Aurora SLA feature is by default only enabled for service (non-cron)
-production jobs (<code>"production = True"</code> in your
<code>.aurora</code> config). It can be enabled for
-non-production services via the scheduler command line flag
<code>-sla_non_prod_metrics</code>.</p>
+<p>The Aurora SLA feature currently supports stat collection only for service
(non-cron)
+production jobs (<code>"production = True"</code> in your
<code>.aurora</code> config).</p>
<p>Counters that track SLA measurements are computed periodically within the
scheduler.
The individual instance metrics are refreshed every minute (configurable via
Modified: aurora/site/publish/documentation/latest/storage-config/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/storage-config/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/storage-config/index.html
(original)
+++ aurora/site/publish/documentation/latest/storage-config/index.html Mon Oct
26 18:57:24 2015
@@ -158,9 +158,9 @@ accomplished by updating the following s
registering with Mesos. E.g.:
<code>-mesos_master_address=zk://localhost:2181</code></li>
<li><code>-max_registration_delay</code> - set to sufficiently long interval
to prevent registration timeout
and as a result scheduler suicide. E.g:
<code>-max_registration_delay=360min</code></li>
-<li>Make sure <code>-reconciliation_initial_delay</code> option is set high
enough (e.g.: <code>365days</code>) to
-prevent accidental task GC. This is important as scheduler will attempt to
reconcile the cluster
-state and will kill all tasks when restarted with an empty Mesos replicated
log.</li>
+<li>Make sure <code>-gc_executor_path</code> option is not set to prevent
accidental task GC. This is
+important as scheduler will attempt to reconcile the cluster state and will
kill all tasks when
+restarted with an empty Mesos replicated log.</li>
</ul></li>
<li><p>Restart all schedulers</p></li>
</ul>
Modified:
aurora/site/publish/documentation/latest/test-resource-generation/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/test-resource-generation/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
---
aurora/site/publish/documentation/latest/test-resource-generation/index.html
(original)
+++
aurora/site/publish/documentation/latest/test-resource-generation/index.html
Mon Oct 26 18:57:24 2015
@@ -46,8 +46,9 @@
<p>The Aurora source repository and distributions contain several
<a href="../src/test/resources/org/apache/thermos/root/checkpoints">binary
files</a> to
qualify the backwards-compatibility of thermos with checkpoint data. Since
-thermos persists state to disk, to be read by the thermos observer), it is
important that we have
-tests that prevent regressions affecting the ability to parse
previously-written data.</p>
+thermos persists state to disk, to be read by other components (the GC executor
+and the thermos observer), it is important that we have tests that prevent
+regressions affecting the ability to parse previously-written data.</p>
<h2 id="generating-test-files">Generating test files</h2>
Modified: aurora/site/publish/documentation/latest/vagrant/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/latest/vagrant/index.html?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/documentation/latest/vagrant/index.html (original)
+++ aurora/site/publish/documentation/latest/vagrant/index.html Mon Oct 26
18:57:24 2015
@@ -78,7 +78,7 @@ common commands for this tool.</p>
<h2 id="clone-the-aurora-repository">Clone the Aurora repository</h2>
<p>To obtain the Aurora source distribution, clone its Git repository using
the following command:</p>
-<pre class="highlight text"> git clone git://git.apache.org/aurora.git
+<pre class="highlight text"> git clone http://git.apache.org/aurora.git
</pre>
<h2 id="start-the-local-cluster">Start the local cluster</h2>
Modified: aurora/site/publish/sitemap.xml
URL:
http://svn.apache.org/viewvc/aurora/site/publish/sitemap.xml?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/publish/sitemap.xml (original)
+++ aurora/site/publish/sitemap.xml Mon Oct 26 18:57:24 2015
@@ -2,158 +2,146 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://aurora.apache.org/blog/aurora-0-6-0-incubating-released/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/blog/aurora-0-7-0-incubating-released/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/blog/2015-upcoming-apache-aurora-meetups/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/blog/aurora-0-8-0-released/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/blog/aurora-0-9-0-released/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/blog/aurora-at-mesoscon-seattle/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/blog/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/community/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/developers/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/docs/gettingstarted/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/docs/howtocontribute/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
- </url>
- <url>
- <loc>http://aurora.apache.org/documentation/latest/build-system/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/client-cluster-configuration/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/client-commands/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/committers/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/configuration-reference/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/configuration-tutorial/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/contributing/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/cron-jobs/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/deploying-aurora-scheduler/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/developing-aurora-client/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/developing-aurora-scheduler/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/hooks/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/monitoring/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
- </url>
- <url>
- <loc>http://aurora.apache.org/documentation/latest/presentations/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
- <loc>http://aurora.apache.org/documentation/latest/resources/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+
<loc>http://aurora.apache.org/documentation/latest/resource-isolation/</loc>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/scheduler-storage/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
- </url>
- <url>
- <loc>http://aurora.apache.org/documentation/latest/security/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/sla/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/storage-config/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/storage/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/test-resource-generation/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/thrift-deprecation/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/tutorial/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/user-guide/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/vagrant/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/documentation/latest/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/downloads/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
<url>
<loc>http://aurora.apache.org/</loc>
- <lastmod>2015-09-23T00:00:00-07:00</lastmod>
+ <lastmod>2015-10-26T00:00:00-05:00</lastmod>
</url>
</urlset>
\ No newline at end of file
Modified: aurora/site/source/community.html.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/community.html.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/community.html.md (original)
+++ aurora/site/source/community.html.md Mon Oct 26 18:57:24 2015
@@ -34,7 +34,7 @@
</div>
<div class="col-md-4">
<h3>Follow the Project</h3>
- <a class="twitter-timeline"
href="https://twitter.com/ApacheAurora"
data-widget-id="512693636127920129">Tweets by @ApacheMesos</a>
+ <a class="twitter-timeline"
href="https://twitter.com/ApacheAurora"
data-widget-id="512693636127920129">Tweets by @ApacheAurora</a>
<script>!function(d,s,id){var
js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>
</div>
</div>
Modified: aurora/site/source/documentation/latest.html.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest.html.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest.html.md (original)
+++ aurora/site/source/documentation/latest.html.md Mon Oct 26 18:57:24 2015
@@ -23,15 +23,11 @@ We encourage you to ask questions on the
* [Scheduler Storage](/documentation/latest/storage/)
* [Scheduler Storage and Maintenance](/documentation/latest/storage-config/)
* [SLA Measurement](/documentation/latest/sla/)
- * [Resource Isolation and Sizing](/documentation/latest/resources/)
+ * [Resource Isolation and Sizing](/documentation/latest/resource-isolation/)
* [Generating test resources](/documentation/latest/test-resource-generation/)
## Developers
- * [Contributing to the project](../CONTRIBUTING.md)
+ * [Contributing to the project](/documentation/latest/contributing/)
* [Developing the Aurora
Scheduler](/documentation/latest/developing-aurora-scheduler/)
* [Developing the Aurora
Client](/documentation/latest/developing-aurora-client/)
* [Committers Guide](/documentation/latest/committers/)
- * [Build System](/documentation/latest/build-system/)
-
-## Additional Resources
- * [Presentation videos and slides](/documentation/latest/presentations/)
Modified: aurora/site/source/documentation/latest/client-commands.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/client-commands.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/client-commands.md (original)
+++ aurora/site/source/documentation/latest/client-commands.md Mon Oct 26
18:57:24 2015
@@ -332,8 +332,7 @@ configuration file, and displays the par
aurora quota get CLUSTER/ROLE
Prints the production quota allocated to the role's value at the given
-cluster. Only
non-[dedicated](deploying-aurora-scheduler.md#dedicated-attribute)
-[production](configuration-reference.md#job-objects) jobs consume quota.
+cluster.
### Finding a Job on Web UI
Modified: aurora/site/source/documentation/latest/configuration-reference.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/configuration-reference.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/configuration-reference.md
(original)
+++ aurora/site/source/documentation/latest/configuration-reference.md Mon Oct
26 18:57:24 2015
@@ -26,12 +26,10 @@ Aurora + Thermos Configuration Reference
- [Job Schema](#job-schema)
- [Job Objects](#job-objects)
- [Services](#services)
- - [Revocable Jobs](#revocable-jobs)
- [UpdateConfig Objects](#updateconfig-objects)
- [HealthCheckConfig Objects](#healthcheckconfig-objects)
- [Announcer Objects](#announcer-objects)
- [Container Objects](#container)
- - [LifecycleConfig Objects](#lifecycleconfig-objects)
- [Specifying Scheduling Constraints](#specifying-scheduling-constraints)
- [Template Namespaces](#template-namespaces)
- [mesos Namespace](#mesos-namespace)
@@ -279,7 +277,6 @@ Client applications with higher priority
finalization wait (e.g. through parameters to `thermos kill`), so this
is mostly a best-effort signal.
-
### Constraint Object
Current constraint objects only support a single ordering constraint, `order`,
@@ -294,7 +291,7 @@ ordering constraints.
### Resource Object
Specifies the amount of CPU, Ram, and disk resources the task needs. See the
-[Resource Isolation document](/documentation/latest/resources/) for suggested
values and to understand how
+[Resource Isolation document](/documentation/latest/resource-isolation/) for
suggested values and to understand how
resources are allocated.
param | type | description
@@ -325,11 +322,9 @@ Job Schema
```service``` | Boolean | If True, restart tasks regardless of success or
failure. (Default: False)
```max_task_failures``` | Integer | Maximum number of failures after which
the task is considered to have failed (Default: 1) Set to -1 to allow for
infinite failures
```priority``` | Integer | Preemption priority to give the task (Default 0).
Tasks with higher priorities may preempt tasks at lower priorities.
- ```production``` | Boolean | Whether or not this is a production task that
may [preempt](resources.md#task-preemption) other tasks (Default: False).
Production job role must have the appropriate
[quota](resources.md#resource-quota).
+ ```production``` | Boolean | Whether or not this is a production task
backed by quota (Default: False). Production jobs may preempt any
non-production job, and may only be preempted by production jobs in the same
role and of higher priority. To run jobs at this level, the job role must have
the appropriate quota. To grant quota to a particular role in production,
operators use the ``aurora_admin set_quota`` command.
```health_check_config``` | ```heath_check_config``` object | Parameters for
controlling a task's health checks via HTTP. Only used if a health port was
assigned with a command line wildcard.
```container``` | ```Container``` object | An optional container to run all
processes inside of.
- ```lifecycle``` | ```LifecycleConfig``` object | An optional task lifecycle
configuration that dictates commands to be executed on startup/teardown. HTTP
lifecycle is enabled by default if the "health" port is requested. See
[LifecycleConfig Objects](#lifecycleconfig-objects) for more information.
- ```tier``` | String | Task tier type. When set to `revocable` requires the
task to run with Mesos revocable resources. This is work [in
progress](https://issues.apache.org/jira/browse/AURORA-1343) and is currently
only supported for the revocable tasks. The ultimate goal is to simplify task
configuration by hiding various configuration knobs behind a task tier
definition. See AURORA-1343 and AURORA-1443 for more details.
### Services
@@ -341,21 +336,6 @@ Jobs without the service bit set only re
`max_task_failures` times and only if they terminated unsuccessfully
either due to human error or machine failure.
-### Revocable Jobs
-
-**WARNING**: This feature is currently in alpha status. Do not use it in
production clusters!
-
-Mesos [supports a concept of revocable
tasks](http://mesos.apache.org/documentation/latest/oversubscription/)
-by oversubscribing machine resources by the amount deemed safe to not affect
the existing
-non-revocable tasks. Aurora now supports revocable jobs via a `tier` setting
set to `revocable`
-value.
-
-More implementation details in this
[ticket](https://issues.apache.org/jira/browse/AURORA-1343).
-
-Scheduler must be
[configured](deploying-aurora-scheduler.md#configuring-resource-oversubscription)
-to receive revocable offers from Mesos and accept revocable jobs. If not
configured properly
-revocable tasks will never get assigned to hosts and will stay in PENDING.
-
### UpdateConfig Objects
Parameters for controlling the rate and policy of rolling updates.
@@ -379,11 +359,8 @@ Parameters for controlling a task's heal
| ------- | :-------: | --------
| ```initial_interval_secs``` | Integer | Initial delay for performing an
HTTP health check. (Default: 15)
| ```interval_secs``` | Integer | Interval on which to check the
task's health via HTTP. (Default: 10)
-| ```max_consecutive_failures``` | Integer | Maximum number of consecutive
failures that tolerated before considering a task unhealthy (Default: 0)
| ```timeout_secs``` | Integer | HTTP request timeout. (Default:
1)
-| ```endpoint``` | String | HTTP endpoint to check
(Default: /health)
-| ```expected_response``` | String | If not empty, fail the health
check if the response differs. Case insensitive. (Default: ok)
-| ```expected_response_code``` | Integer | If not zero, fail the health
check if the response code differs. (Default: 0)
+| ```max_consecutive_failures``` | Integer | Maximum number of consecutive
failures that tolerated before considering a task unhealthy (Default: 0)
### Announcer Objects
@@ -434,51 +411,9 @@ Describes the container the job's proces
### Docker Object
- param | type | description
- ----- | :----: | -----------
- ```image``` | String | The name of the docker image to
execute. If the image does not exist locally it will be pulled with ```docker
pull```.
- ```parameters``` | List(Parameter) | Additional parameters to pass to the
docker containerizer.
-
-### Docker Parameter Object
-
-Docker CLI parameters. This needs to be enabled by the scheduler
`enable_docker_parameters` option.
-See [Docker Command Line
Reference](https://docs.docker.com/reference/commandline/run/) for valid
parameters.
-
- param | type | description
- ----- | :----: | -----------
- ```name``` | String | The name of the docker parameter. E.g.
volume
- ```value``` | String | The value of the parameter. E.g.
/usr/local/bin:/usr/bin:rw
-
-### LifecycleConfig Objects
-
-*Note: The only lifecycle configuration supported is the HTTP lifecycle via
the HTTPLifecycleConfig.*
-
- param | type | description
- ----- | :----: | -----------
- ```http``` | HTTPLifecycleConfig | Configure the lifecycle manager to
send lifecycle commands to the task via HTTP.
-
-### HTTPLifecycleConfig Objects
-
- param | type | description
- ----- | :----: | -----------
- ```port``` | String | The named port to send POST commands
(Default: health)
- ```graceful_shutdown_endpoint``` | String | Endpoint to hit to indicate that
a task should gracefully shutdown. (Default: /quitquitquit)
- ```shutdown_endpoint``` | String | Endpoint to hit to give a task its final
warning before being killed. (Default: /abortabortabort)
-
-#### graceful_shutdown_endpoint
-
-If the Job is listening on the port as specified by the HTTPLifecycleConfig
-(default: `health`), a HTTP POST request will be sent over localhost to this
-endpoint to request that the task gracefully shut itself down. This is a
-courtesy call before the `shutdown_endpoint` is invoked a fixed amount of
-time later.
-
-#### shutdown_endpoint
-
-If the Job is listening on the port as specified by the HTTPLifecycleConfig
-(default: `health`), a HTTP POST request will be sent over localhost to this
-endpoint to request as a final warning before being shut down. If the task
-does not shut down on its own after this, it will be forcefully killed
+ param | type | description
+ ----- | :----: | -----------
+ ```image``` | String | The name of the docker image to execute.
If the image does not exist locally it will be pulled with ```docker pull```.
Specifying Scheduling Constraints
Modified: aurora/site/source/documentation/latest/configuration-tutorial.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/configuration-tutorial.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/configuration-tutorial.md (original)
+++ aurora/site/source/documentation/latest/configuration-tutorial.md Mon Oct
26 18:57:24 2015
@@ -581,7 +581,12 @@ Three attributes deal with configuring t
values.
- `production`: a Boolean, defaulting to `False`, specifying that this
- is a [production](configuration-reference.md#job-objects) job.
+ is a production job backed by quota. Tasks from production Jobs may
+ preempt tasks from any non-production job, and may only be preempted
+ by tasks from production jobs in the same role with higher
+ priority. **WARNING**: To run Jobs at this level, the Job role must
+ have the appropriate quota. To grant quota to a particular role in
+ production, operators use the ``aurora_admin set_quota`` command.
The final three Job attributes each take an object as their value.
Modified: aurora/site/source/documentation/latest/cron-jobs.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/cron-jobs.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/cron-jobs.md (original)
+++ aurora/site/source/documentation/latest/cron-jobs.md Mon Oct 26 18:57:24
2015
@@ -67,7 +67,7 @@ grow faster than they can process it.
Unlike with services, which aurora will always re-execute regardless of exit
status, instances of
cron jobs retry according to the `max_task_failures` attribute of the
-[Task](configuration-reference.md#task-objects) object. To get
"run-until-success" semantics,
+[Task](configuration-reference.md#task-objects) object. To get
"run-until-failure" semantics,
set `max_task_failures` to `-1`.
## Interacting with cron jobs via the Aurora CLI
Modified: aurora/site/source/documentation/latest/deploying-aurora-scheduler.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/deploying-aurora-scheduler.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/deploying-aurora-scheduler.md
(original)
+++ aurora/site/source/documentation/latest/deploying-aurora-scheduler.md Mon
Oct 26 18:57:24 2015
@@ -13,8 +13,6 @@ machines. This guide helps you get the
- [Storage Performance Considerations](#storage-performance-considerations)
- [Network considerations](#network-considerations)
- [Considerations for running jobs in
docker](#considerations-for-running-jobs-in-docker)
- - [Security Considerations](#security-considerations)
- - [Configuring Resource
Oversubscription](#configuring-resource-oversubscription)
- [Running Aurora](#running-aurora)
- [Maintaining an Aurora Installation](#maintaining-an-aurora-installation)
- [Monitoring](#monitoring)
@@ -22,8 +20,6 @@ machines. This guide helps you get the
- [Dedicated attribute](#dedicated-attribute)
- [Syntax](#syntax)
- [Example](#example)
-- [Best practices](#best-practices)
- - [Diversity](#diversity)
- [Common problems](#common-problems)
- [Replicated log not initialized](#replicated-log-not-initialized)
- [Symptoms](#symptoms)
@@ -31,14 +27,14 @@ machines. This guide helps you get the
- [Scheduler not registered](#scheduler-not-registered)
- [Symptoms](#symptoms-1)
- [Solution](#solution-1)
-- [Changing Scheduler Quorum Size](#changing-scheduler-quorum-size)
- - [Preparation](#preparation)
- - [Adding New Schedulers](#adding-new-schedulers)
+ - [Tasks are stuck in PENDING forever](#tasks-are-stuck-in-pending-forever)
+ - [Symptoms](#symptoms-2)
+ - [Solution](#solution-2)
## Installing Aurora
The Aurora scheduler is a standalone Java server. As part of the build process
it creates a bundle
of all its dependencies, with the notable exceptions of the JVM and libmesos.
Each target server
-should have a JVM (Java 7 or higher) and libmesos (0.23.0) installed.
+should have a JVM (Java 7 or higher) and libmesos (0.21.1) installed.
### Creating the Distribution .zip File (Optional)
To create a distribution for installation you will need build tools installed.
On Ubuntu this can be
@@ -187,25 +183,6 @@ For example, monit can be configured wit
assuming you set `-http_port=8081`.
-## Security Considerations
-
-See [security.md](/documentation/latest/security/).
-
-## Configuring Resource Oversubscription
-
-**WARNING**: This feature is currently in alpha status. Do not use it in
production clusters!
-See [this document](configuration-reference.md#revocable-jobs) for more
feature details.
-
-Set these scheduler flag to allow receiving revocable Mesos offers:
-
- -receive_revocable_resources=true
-
-Specify a tier configuration file path:
-
- -tier_config=path/to/tiers/config.json
-
-Example [tier configuration
file](../src/test/resources/org/apache/aurora/scheduler/tiers-example.json).
-
### Maintaining an Aurora Installation
### Monitoring
@@ -225,9 +202,6 @@ constraints are arbitrary and available
`dedicated` attribute. Aurora treats this specially, and only allows matching
jobs to run on these
machines, and will only schedule matching jobs on these machines.
-See the [section](resources.md#resource-quota) about resource quotas to learn
how quotas apply to
-dedicated jobs.
-
##### Syntax
The dedicated attribute has semantic meaning. The format is `$role(/.*)?`.
When a job is created,
the scheduler requires that the `$role` component matches the `role` field in
the job
@@ -238,7 +212,7 @@ enforce this.
##### Example
Consider the following slave command line:
- mesos-slave --attributes="dedicated:db_team/redis" ...
+ mesos-slave --attributes="host:$HOST;rack:$RACK;dedicated:db_team/redis"
...
And this job configuration:
@@ -255,19 +229,6 @@ The job configuration is indicating that
`dedicated:db_team/redis`. Additionally, Aurora will prevent any tasks that
do _not_ have that
constraint from running on those slaves.
-## Best practices
-### Diversity
-Data centers are often organized with hierarchical failure domains. Common
failure domains
-include hosts, racks, rows, and PDUs. If you have this information available,
it is wise to tag
-the mesos-slave with them as
-[attributes](https://mesos.apache.org/documentation/attributes-resources/).
-
-When it comes time to schedule jobs, Aurora will automatically spread them
across the failure
-domains as specified in the
-[job
configuration](configuration-reference.md#specifying-scheduling-constraints).
-
-Note: in virtualized environments like EC2, the only attribute that usually
makes sense for this
-purpose is `host`.
## Common problems
So you've started your first cluster and are running into some issues? We've
collected some common
@@ -309,18 +270,15 @@ is the same as the one on the scheduler:
-mesos_master_address=zk://$ZK_HOST:2181/mesos/master
-## Changing Scheduler Quorum Size
-Special care needs to be taken when changing the size of the Aurora scheduler
quorum.
-Since Aurora uses a Mesos replicated log, similar steps need to be followed as
when
-[changing the mesos quorum
size](http://mesos.apache.org/documentation/latest/operational-guide).
-
-### Preparation
-Increase [-native_log_quorum_size](storage-config.md#-native_log_quorum_size)
on each
-existing scheduler and restart them. When updating from 3 to 5 schedulers, the
quorum size
-would grow from 2 to 3.
-
-### Adding New Schedulers
-Start the new schedulers with `-native_log_quorum_size` set to the new value.
Failing to
-first increase the quorum size on running schedulers can in some cases result
in corruption
-or truncating of the replicated log used by Aurora. In that case, see the
documentation on
-[recovering from backup](storage-config.md#recovering-from-a-scheduler-backup).
+### Tasks are stuck in `PENDING` forever
+
+#### Symptoms
+The scheduler is registered, and (receiving
offers](docs/monitoring.md#scheduler_resource_offers),
+but tasks are perpetually shown as `PENDING - Constraint not satisfied: host`.
+
+#### Solution
+Check that your slaves are configured with `host` and `rack` attributes.
Aurora requires that
+slaves are tagged with these two common failure domains to ensure that it can
safely place tasks
+such that jobs are resilient to failure.
+
+See our [vagrant example](examples/vagrant/upstart/mesos-slave.conf) for
details.
Modified: aurora/site/source/documentation/latest/developing-aurora-client.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/developing-aurora-client.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/developing-aurora-client.md
(original)
+++ aurora/site/source/documentation/latest/developing-aurora-client.md Mon Oct
26 18:57:24 2015
@@ -30,7 +30,7 @@ To start a virtual cluster, you need to
the aurora workspace. This will create a vagrant host named "devcluster", with
a mesos master, a set
of mesos slaves, and an aurora scheduler.
-If you have a change you would like to test in your local cluster, you'll
rebuild the client:
+If you have changed you would like to test in your local cluster, you'll
rebuild the client:
vagrant ssh -c 'aurorabuild client'
Modified: aurora/site/source/documentation/latest/developing-aurora-scheduler.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/developing-aurora-scheduler.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/developing-aurora-scheduler.md
(original)
+++ aurora/site/source/documentation/latest/developing-aurora-scheduler.md Mon
Oct 26 18:57:24 2015
@@ -90,25 +90,6 @@ use the following commands to view and m
bower update <library name>
bower help
-Faster Iteration in Vagrant
----------------------------
-The scheduler serves UI assets from the classpath. For production deployments
this means the assets
-are served from within a jar. However, for faster development iteration, the
vagrant image is
-configured to add `/vagrant/dist/resources/main` to the head of CLASSPATH.
This path is configured
-as a shared filesystem to the path on the host system where your Aurora
repository lives. This means
-that any updates to dist/resources/main in your checkout will be reflected
immediately in the UI
-served from within the vagrant image.
-
-The one caveat to this is that this path is under `dist` not `src`. This is
because the assets must
-be processed by gradle before they can be served. So, unfortunately, you
cannot just save your local
-changes and see them reflected in the UI, you must first run `./gradlew
processResources`. This is
-less than ideal, but better than having to restart the scheduler after every
change. Additionally,
-gradle makes this process somewhat easier with the use of the `--continuous`
flag. If you run:
-`./gradlew processResources --continuous` gradle will monitor the filesystem
for changes and run the
-task automatically as necessary. This doesn't quite provide hot-reload
capabilities, but it does
-allow for <5s from save to changes being visibile in the UI with no further
action required on the
-part of the developer.
-
Developing the Aurora Build System
==================================
Modified: aurora/site/source/documentation/latest/monitoring.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/monitoring.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/monitoring.md (original)
+++ aurora/site/source/documentation/latest/monitoring.md Mon Oct 26 18:57:24
2015
@@ -74,108 +74,133 @@ recommend you start with a strict value
adjust thresholds as you see fit. Feel free to ask us if you would like to
validate that your alerts
and thresholds make sense.
-## Important stats
-
-### `jvm_uptime_secs`
+#### `jvm_uptime_secs`
Type: integer counter
+#### Description
The number of seconds the JVM process has been running. Comes from
[RuntimeMXBean#getUptime()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/RuntimeMXBean.html#getUptime\(\))
+#### Alerting
Detecting resets (decreasing values) on this stat will tell you that the
scheduler is failing to
stay alive.
+#### Triage
Look at the scheduler logs to identify the reason the scheduler is exiting.
-### `system_load_avg`
+#### `system_load_avg`
Type: double gauge
+#### Description
The current load average of the system for the last minute. Comes from
[OperatingSystemMXBean#getSystemLoadAverage()](http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html?is-external=true#getSystemLoadAverage\(\)).
+#### Alerting
A high sustained value suggests that the scheduler machine may be
over-utilized.
+#### Triage
Use standard unix tools like `top` and `ps` to track down the offending
process(es).
-### `process_cpu_cores_utilized`
+#### `process_cpu_cores_utilized`
Type: double gauge
+#### Description
The current number of CPU cores in use by the JVM process. This should not
exceed the number of
logical CPU cores on the machine. Derived from
[OperatingSystemMXBean#getProcessCpuTime()](http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html)
+#### Alerting
A high sustained value indicates that the scheduler is overworked. Due to
current internal design
limitations, if this value is sustained at `1`, there is a good chance the
scheduler is under water.
+#### Triage
There are two main inputs that tend to drive this figure: task scheduling
attempts and status
updates from Mesos. You may see activity in the scheduler logs to give an
indication of where
time is being spent. Beyond that, it really takes good familiarity with the
code to effectively
triage this. We suggest engaging with an Aurora developer.
-### `task_store_LOST`
+#### `task_store_LOST`
Type: integer gauge
+#### Description
The number of tasks stored in the scheduler that are in the `LOST` state, and
have been rescheduled.
+#### Alerting
If this value is increasing at a high rate, it is a sign of trouble.
+#### Triage
There are many sources of `LOST` tasks in Mesos: the scheduler, master, slave,
and executor can all
trigger this. The first step is to look in the scheduler logs for `LOST` to
identify where the
state changes are originating.
-### `scheduler_resource_offers`
+#### `scheduler_resource_offers`
Type: integer counter
+#### Description
The number of resource offers that the scheduler has received.
+#### Alerting
For a healthy scheduler, this value must be increasing over time.
+##### Triage
Assuming the scheduler is up and otherwise healthy, you will want to check if
the master thinks it
is sending offers. You should also look at the master's web interface to see
if it has a large
number of outstanding offers that it is waiting to be returned.
-### `framework_registered`
+#### `framework_registered`
Type: binary integer counter
+#### Description
Will be `1` for the leading scheduler that is registered with the Mesos
master, `0` for passive
schedulers,
+#### Alerting
A sustained period without a `1` (or where `sum() != 1`) warrants
investigation.
+#### Triage
If there is no leading scheduler, look in the scheduler and master logs for
why. If there are
multiple schedulers claiming leadership, this suggests a split brain and
warrants filing a critical
bug.
-###
`rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)`
+####
`rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)`
Type: rate ratio of integer counters
+#### Description
This composes two counters to compute a windowed figure for the latency of
replicated log writes.
+#### Alerting
A hike in this value suggests disk bandwidth contention.
+#### Triage
Look in scheduler logs for any reported oddness with saving to the replicated
log. Also use
standard tools like `vmstat` and `iotop` to identify whether the disk has
become slow or
over-utilized. We suggest using a dedicated disk for the replicated log to
mitigate this.
-### `timed_out_tasks`
+#### `timed_out_tasks`
Type: integer counter
+#### Description
Tracks the number of times the scheduler has given up while waiting
(for `-transient_task_state_timeout`) to hear back about a task that is in a
transient state
(e.g. `ASSIGNED`, `KILLING`), and has moved to `LOST` before rescheduling.
+#### Alerting
This value is currently known to increase occasionally when the scheduler
fails over
([AURORA-740](https://issues.apache.org/jira/browse/AURORA-740)). However, any
large spike in this
value warrants investigation.
+#### Triage
The scheduler will log when it times out a task. You should trace the task ID
of the timed out
task into the master, slave, and/or executors to determine where the message
was dropped.
-### `http_500_responses_events`
+#### `http_500_responses_events`
Type: integer counter
+#### Description
The total number of HTTP 500 status responses sent by the scheduler. Includes
API and asset serving.
+#### Alerting
An increase warrants investigation.
+#### Triage
Look in scheduler logs to identify why the scheduler returned a 500, there
should be a stack trace.
Modified: aurora/site/source/documentation/latest/sla.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/sla.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/sla.md (original)
+++ aurora/site/source/documentation/latest/sla.md Mon Oct 26 18:57:24 2015
@@ -15,9 +15,8 @@ The primary goal of the feature is colle
Agreements) metrics that defining a contractual relationship between the
Aurora/Mesos platform
and hosted services.
-The Aurora SLA feature is by default only enabled for service (non-cron)
-production jobs (`"production = True"` in your `.aurora` config). It can be
enabled for
-non-production services via the scheduler command line flag
`-sla_non_prod_metrics`.
+The Aurora SLA feature currently supports stat collection only for service
(non-cron)
+production jobs (`"production = True"` in your `.aurora` config).
Counters that track SLA measurements are computed periodically within the
scheduler.
The individual instance metrics are refreshed every minute (configurable via
@@ -174,4 +173,4 @@ unreasonable resource constraints) do no
* The availability of Aurora SLA metrics is bound by the scheduler
availability.
* All metrics are calculated at a pre-defined interval (currently set at 1
minute).
- Scheduler restarts may result in missed collections.
+ Scheduler restarts may result in missed collections.
\ No newline at end of file
Modified: aurora/site/source/documentation/latest/storage-config.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/storage-config.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/storage-config.md (original)
+++ aurora/site/source/documentation/latest/storage-config.md Mon Oct 26
18:57:24 2015
@@ -100,9 +100,9 @@ accomplished by updating the following s
registering with Mesos. E.g.: `-mesos_master_address=zk://localhost:2181`
* `-max_registration_delay` - set to sufficiently long interval to prevent
registration timeout
and as a result scheduler suicide. E.g: `-max_registration_delay=360min`
- * Make sure `-reconciliation_initial_delay` option is set high enough (e.g.:
`365days`) to
- prevent accidental task GC. This is important as scheduler will attempt to
reconcile the cluster
- state and will kill all tasks when restarted with an empty Mesos
replicated log.
+ * Make sure `-gc_executor_path` option is not set to prevent accidental task
GC. This is
+ important as scheduler will attempt to reconcile the cluster state and
will kill all tasks when
+ restarted with an empty Mesos replicated log.
* Restart all schedulers
Modified: aurora/site/source/documentation/latest/test-resource-generation.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/test-resource-generation.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/test-resource-generation.md
(original)
+++ aurora/site/source/documentation/latest/test-resource-generation.md Mon Oct
26 18:57:24 2015
@@ -4,8 +4,9 @@
The Aurora source repository and distributions contain several
[binary files](../src/test/resources/org/apache/thermos/root/checkpoints) to
qualify the backwards-compatibility of thermos with checkpoint data. Since
-thermos persists state to disk, to be read by the thermos observer), it is
important that we have
-tests that prevent regressions affecting the ability to parse
previously-written data.
+thermos persists state to disk, to be read by other components (the GC executor
+and the thermos observer), it is important that we have tests that prevent
+regressions affecting the ability to parse previously-written data.
## Generating test files
The files included represent persisted checkpoints that exercise different
Modified: aurora/site/source/documentation/latest/vagrant.md
URL:
http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/vagrant.md?rev=1710677&r1=1710676&r2=1710677&view=diff
==============================================================================
--- aurora/site/source/documentation/latest/vagrant.md (original)
+++ aurora/site/source/documentation/latest/vagrant.md Mon Oct 26 18:57:24 2015
@@ -43,7 +43,7 @@ Clone the Aurora repository
To obtain the Aurora source distribution, clone its Git repository using the
following command:
- git clone git://git.apache.org/aurora.git
+ git clone http://git.apache.org/aurora.git
Start the local cluster