Propchange:
aurora/site/publish/documentation/0.13.0/images/presentations/09_20_2015_shipping_code_with_aurora_thumb.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added:
aurora/site/publish/documentation/0.13.0/images/presentations/09_20_2015_twitter_production_scale_thumb.png
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/images/presentations/09_20_2015_twitter_production_scale_thumb.png?rev=1739400&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
aurora/site/publish/documentation/0.13.0/images/presentations/09_20_2015_twitter_production_scale_thumb.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added:
aurora/site/publish/documentation/0.13.0/images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png?rev=1739400&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
aurora/site/publish/documentation/0.13.0/images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added:
aurora/site/publish/documentation/0.13.0/images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png?rev=1739400&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
aurora/site/publish/documentation/0.13.0/images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: aurora/site/publish/documentation/0.13.0/images/runningtask.png
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/images/runningtask.png?rev=1739400&view=auto
==============================================================================
Binary file - no diff available.
Propchange: aurora/site/publish/documentation/0.13.0/images/runningtask.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: aurora/site/publish/documentation/0.13.0/images/stderr.png
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/images/stderr.png?rev=1739400&view=auto
==============================================================================
Binary file - no diff available.
Propchange: aurora/site/publish/documentation/0.13.0/images/stderr.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: aurora/site/publish/documentation/0.13.0/images/stdout.png
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/images/stdout.png?rev=1739400&view=auto
==============================================================================
Binary file - no diff available.
Propchange: aurora/site/publish/documentation/0.13.0/images/stdout.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: aurora/site/publish/documentation/0.13.0/images/storage_hierarchy.png
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/images/storage_hierarchy.png?rev=1739400&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
aurora/site/publish/documentation/0.13.0/images/storage_hierarchy.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: aurora/site/publish/documentation/0.13.0/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/index.html?rev=1739400&view=auto
==============================================================================
--- aurora/site/publish/documentation/0.13.0/index.html (added)
+++ aurora/site/publish/documentation/0.13.0/index.html Sat Apr 16 04:09:25 2016
@@ -0,0 +1,211 @@
+<!DOCTYPE html>
+<html lang="en">
+ <head>
+ <meta charset="utf-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <title>Apache Aurora</title>
+ <link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
+ <link href="/assets/css/main.css" rel="stylesheet">
+ <!-- Analytics -->
+ <script type="text/javascript">
+ var _gaq = _gaq || [];
+ _gaq.push(['_setAccount', 'UA-45879646-1']);
+ _gaq.push(['_setDomainName', 'apache.org']);
+ _gaq.push(['_trackPageview']);
+
+ (function() {
+ var ga = document.createElement('script'); ga.type =
'text/javascript'; ga.async = true;
+ ga.src = ('https:' == document.location.protocol ?
'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+ var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
+ })();
+ </script>
+ </head>
+ <body>
+ <div class="container-fluid section-header">
+ <div class="container">
+ <div class="nav nav-bar">
+ <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300"
alt="Transparent Apache Aurora logo with dark background"/></a>
+ <ul class="nav navbar-nav navbar-right">
+ <li><a href="/documentation/latest/">Documentation</a></li>
+ <li><a href="/community/">Community</a></li>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ </ul>
+ </div>
+ </div>
+</div>
+
+ <div class="container-fluid">
+ <div class="container content">
+ <div class="col-md-12 documentation">
+<h5 class="page-header text-uppercase">Documentation
+<select onChange="window.location.href='/documentation/' + this.value + '/'"
+ value="0.13.0">
+ <option value="0.13.0"
+ selected="selected">
+ 0.13.0
+ (latest)
+ </option>
+ <option value="0.12.0"
+ >
+ 0.12.0
+ </option>
+ <option value="0.11.0"
+ >
+ 0.11.0
+ </option>
+ <option value="0.10.0"
+ >
+ 0.10.0
+ </option>
+ <option value="0.9.0"
+ >
+ 0.9.0
+ </option>
+ <option value="0.8.0"
+ >
+ 0.8.0
+ </option>
+ <option value="0.7.0-incubating"
+ >
+ 0.7.0-incubating
+ </option>
+ <option value="0.6.0-incubating"
+ >
+ 0.6.0-incubating
+ </option>
+ <option value="0.5.0-incubating"
+ >
+ 0.5.0-incubating
+ </option>
+</select>
+</h5>
+<h2 id="introduction">Introduction</h2>
+
+<p>Apache Aurora is a service scheduler that runs on top of Apache Mesos,
enabling you to run
+long-running services, cron jobs, and ad-hoc jobs that take advantage of
Apache Mesos’ scalability,
+fault-tolerance, and resource isolation.</p>
+
+<p>We encourage you to ask questions on the <a
href="http://aurora.apache.org/community/">Aurora user list</a> or
+the <code>#aurora</code> IRC channel on <code>irc.freenode.net</code>.</p>
+
+<h2 id="getting-started">Getting Started</h2>
+
+<p>Information for everyone new to Apache Aurora.</p>
+
+<ul>
+<li><a href="getting-started/overview.md">Aurora System Overview</a></li>
+<li><a href="getting-started/tutorial.md">Hello World Tutorial</a></li>
+<li><a href="getting-started/vagrant.md">Local cluster with Vagrant</a></li>
+</ul>
+
+<h2 id="features">Features</h2>
+
+<p>Description of important Aurora features.</p>
+
+<ul>
+<li><a href="features/containers.md">Containers</a></li>
+<li><a href="features/cron-jobs.md">Cron Jobs</a></li>
+<li><a href="features/job-updates.md">Job Updates</a></li>
+<li><a href="features/multitenancy.md">Multitenancy</a></li>
+<li><a href="features/resource-isolation.md">Resource Isolation</a></li>
+<li><a href="features/constraints.md">Scheduling Constraints</a></li>
+<li><a href="features/services.md">Services</a></li>
+<li><a href="features/service-discovery.md">Service Discovery</a></li>
+<li><a href="features/sla-metrics.md">SLA Metrics</a></li>
+</ul>
+
+<h2 id="operators">Operators</h2>
+
+<p>For those that wish to manage and fine-tune an Aurora cluster.</p>
+
+<ul>
+<li><a href="operations/installation.md">Installation</a></li>
+<li><a href="operations/configuration.md">Configuration</a></li>
+<li><a href="operations/monitoring.md">Monitoring</a></li>
+<li><a href="operations/security.md">Security</a></li>
+<li><a href="operations/storage.md">Storage</a></li>
+<li><a href="operations/backup-restore.md">Backup</a></li>
+</ul>
+
+<h2 id="reference">Reference</h2>
+
+<p>The complete reference of commands, configuration options, and scheduler
internals.</p>
+
+<ul>
+<li><a href="reference/task-lifecycle.md">Task lifecycle</a></li>
+<li>Configuration (<code>.aurora</code> files)
+
+<ul>
+<li><a href="reference/configuration.md">Configuration Reference</a></li>
+<li><a href="reference/configuration-tutorial.md">Configuration
Tutorial</a></li>
+<li><a href="reference/configuration-best-practices.md">Configuration Best
Practices</a></li>
+<li><a href="reference/configuration-templating.md">Configuration
Templating</a></li>
+</ul></li>
+<li>Aurora Client
+
+<ul>
+<li><a href="reference/client-commands.md">Client Commands</a></li>
+<li><a href="reference/client-hooks.md">Client Hooks</a></li>
+<li><a href="reference/client-cluster-configuration.md">Client Cluster
Configuration</a></li>
+</ul></li>
+<li><a href="reference/scheduler-configuration.md">Scheduler
Configuration</a></li>
+</ul>
+
+<h2 id="additional-resources">Additional Resources</h2>
+
+<ul>
+<li><a href="additional-resources/tools.md">Tools integrating with
Aurora</a></li>
+<li><a href="additional-resources/presentations.md">Presentation videos and
slides</a></li>
+</ul>
+
+<h2 id="developers">Developers</h2>
+
+<p>All the information you need to start modifying Aurora and contributing
back to the project.</p>
+
+<ul>
+<li><a href="contributing/">Contributing to the project</a></li>
+<li><a href="development/committers-guide.md">Committer’s Guide</a></li>
+<li><a href="development/design-documents.md">Design Documents</a></li>
+<li>Developing the Aurora components:
+
+<ul>
+<li><a href="development/client.md">Client</a></li>
+<li><a href="development/scheduler.md">Scheduler</a></li>
+<li><a href="development/ui.md">Scheduler UI</a></li>
+<li><a href="development/thermos.md">Thermos</a></li>
+<li><a href="development/thrift.md">Thrift structures</a></li>
+</ul></li>
+</ul>
+
+</div>
+
+ </div>
+ </div>
+ <div class="container-fluid section-footer buffer">
+ <div class="container">
+ <div class="row">
+ <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3>
+ <ul>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/community/">Mailing Lists</a></li>
+ <li><a
href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li>
+ <li><a href="/documentation/latest/contributing/">How
To Contribute</a></li>
+ </ul>
+ </div>
+ <div class="col-md-2"><h3>The ASF</h3>
+ <ul>
+ <li><a href="http://www.apache.org/licenses/">License</a></li>
+ <li><a
href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
+ <li><a
href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/">Security</a></li>
+ </ul>
+ </div>
+ <div class="col-md-6">
+ <p class="disclaimer">Copyright 2014 <a
href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under
the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a
href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX
photo</a> displayed on the homepage is available under a <a
href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons
BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo
are trademarks of The Apache Software Foundation.</p>
+ </div>
+ </div>
+ </div>
+
+ </body>
+</html>
Added:
aurora/site/publish/documentation/0.13.0/operations/backup-restore/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/operations/backup-restore/index.html?rev=1739400&view=auto
==============================================================================
---
aurora/site/publish/documentation/0.13.0/operations/backup-restore/index.html
(added)
+++
aurora/site/publish/documentation/0.13.0/operations/backup-restore/index.html
Sat Apr 16 04:09:25 2016
@@ -0,0 +1,214 @@
+<!DOCTYPE html>
+<html lang="en">
+ <head>
+ <meta charset="utf-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <title>Apache Aurora</title>
+ <link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
+ <link href="/assets/css/main.css" rel="stylesheet">
+ <!-- Analytics -->
+ <script type="text/javascript">
+ var _gaq = _gaq || [];
+ _gaq.push(['_setAccount', 'UA-45879646-1']);
+ _gaq.push(['_setDomainName', 'apache.org']);
+ _gaq.push(['_trackPageview']);
+
+ (function() {
+ var ga = document.createElement('script'); ga.type =
'text/javascript'; ga.async = true;
+ ga.src = ('https:' == document.location.protocol ?
'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+ var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
+ })();
+ </script>
+ </head>
+ <body>
+ <div class="container-fluid section-header">
+ <div class="container">
+ <div class="nav nav-bar">
+ <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300"
alt="Transparent Apache Aurora logo with dark background"/></a>
+ <ul class="nav navbar-nav navbar-right">
+ <li><a href="/documentation/latest/">Documentation</a></li>
+ <li><a href="/community/">Community</a></li>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ </ul>
+ </div>
+ </div>
+</div>
+
+ <div class="container-fluid">
+ <div class="container content">
+ <div class="col-md-12 documentation">
+<h5 class="page-header text-uppercase">Documentation
+<select onChange="window.location.href='/documentation/' + this.value +
'/operations/backup-restore/'"
+ value="0.13.0">
+ <option value="0.13.0"
+ selected="selected">
+ 0.13.0
+ (latest)
+ </option>
+ <option value="0.12.0"
+ >
+ 0.12.0
+ </option>
+ <option value="0.11.0"
+ >
+ 0.11.0
+ </option>
+ <option value="0.10.0"
+ >
+ 0.10.0
+ </option>
+ <option value="0.9.0"
+ >
+ 0.9.0
+ </option>
+ <option value="0.8.0"
+ >
+ 0.8.0
+ </option>
+ <option value="0.7.0-incubating"
+ >
+ 0.7.0-incubating
+ </option>
+ <option value="0.6.0-incubating"
+ >
+ 0.6.0-incubating
+ </option>
+ <option value="0.5.0-incubating"
+ >
+ 0.5.0-incubating
+ </option>
+</select>
+</h5>
+<h1 id="recovering-from-a-scheduler-backup">Recovering from a Scheduler
Backup</h1>
+
+<p><strong>Be sure to read the entire page before attempting to restore from a
backup, as it may have
+unintended consequences.</strong></p>
+
+<h1 id="summary">Summary</h1>
+
+<p>The restoration procedure replaces the existing (possibly corrupted) Mesos
replicated log with an
+earlier, backed up, version and requires all schedulers to be taken down
temporarily while
+restoring. Once completed, the scheduler state resets to what it was when the
backup was created.
+This means any jobs/tasks created or updated after the backup are unknown to
the scheduler and will
+be killed shortly after the cluster restarts. All other tasks continue
operating as normal.</p>
+
+<p>Usually, it is a bad idea to restore a backup that is not extremely recent
(i.e. older than a few
+hours). This is because the scheduler will expect the cluster to look exactly
as the backup does,
+so any tasks that have been rescheduled since the backup was taken will be
killed.</p>
+
+<p>Instructions below have been verified in <a
href="../getting-started/vagrant.md">Vagrant environment</a> and with minor
+syntax/path changes should be applicable to any Aurora cluster.</p>
+
+<h1 id="preparation">Preparation</h1>
+
+<p>Follow these steps to prepare the cluster for restoring from a backup:</p>
+
+<ul>
+<li><p>Stop all scheduler instances</p></li>
+<li><p>Consider blocking external traffic on a port defined in
<code>-http_port</code> for all schedulers to
+prevent users from interacting with the scheduler during the restoration
process. This will help
+troubleshooting by reducing the scheduler log noise and prevent users from
making changes that will
+be erased after the backup snapshot is restored.</p></li>
+<li><p>Configure <code>aurora_admin</code> access to run all commands listed in
+<a href="#restore-from-backup">Restore from backup</a> section locally on the
leading scheduler:</p>
+
+<ul>
+<li>Make sure the <a
href="../reference/client-cluster-configuration.md">clusters.json</a> file
configured to
+access scheduler directly. Set <code>scheduler_uri</code> setting and remove
<code>zk</code>. Since leader can get
+re-elected during the restore steps, consider doing it on all scheduler
replicas.</li>
+<li><p>Depending on your particular security approach you will need to either
turn off scheduler
+authorization by removing scheduler
<code>-http_authentication_mechanism</code> flag or make sure the
+direct scheduler access is properly authorized. E.g.: in case of Kerberos you
will need to make
+a <code>/etc/hosts</code> file change to match your local IP to the scheduler
URL configured in keytabs:</p>
+
+<p><local_ip> <scheduler_domain_in_keytabs></p></li>
+</ul></li>
+<li><p>Next steps are required to put scheduler into a partially disabled
state where it would still be
+able to accept storage recovery requests but unable to schedule or change task
states. This may be
+accomplished by updating the following scheduler configuration options:</p>
+
+<ul>
+<li>Set <code>-mesos_master_address</code> to a non-existent zk address. This
will prevent scheduler from
+registering with Mesos. E.g.:
<code>-mesos_master_address=zk://localhost:1111/mesos/master</code></li>
+<li><code>-max_registration_delay</code> - set to sufficiently long interval
to prevent registration timeout
+and as a result scheduler suicide. E.g:
<code>-max_registration_delay=360mins</code></li>
+<li>Make sure <code>-reconciliation_initial_delay</code> option is set high
enough (e.g.: <code>365days</code>) to
+prevent accidental task GC. This is important as scheduler will attempt to
reconcile the cluster
+state and will kill all tasks when restarted with an empty Mesos replicated
log.</li>
+</ul></li>
+<li><p>Restart all schedulers</p></li>
+</ul>
+
+<h1 id="cleanup-and-re-initialize-mesos-replicated-log">Cleanup and
re-initialize Mesos replicated log</h1>
+
+<p>Get rid of the corrupted files and re-initialize Mesos replicated log:</p>
+
+<ul>
+<li>Stop schedulers</li>
+<li>Delete all files under <code>-native_log_file_path</code> on all
schedulers</li>
+<li>Initialize Mesos replica’s log file: <code>sudo mesos-log initialize
--path=<-native_log_file_path></code></li>
+<li>Start schedulers</li>
+</ul>
+
+<h1 id="restore-from-backup">Restore from backup</h1>
+
+<p>At this point the scheduler is ready to rehydrate from the backup:</p>
+
+<ul>
+<li><p>Identify the leading scheduler by:</p>
+
+<ul>
+<li>examining the
<code>scheduler_lifecycle_LEADER_AWAITING_REGISTRATION</code> metric at the
scheduler
+<code>/vars</code> endpoint. Leader will have 1. All other replicas - 0.</li>
+<li>examining scheduler logs</li>
+<li>or examining Zookeeper registration under the path defined by
<code>-zk_endpoints</code>
+and <code>-serverset_path</code></li>
+</ul></li>
+<li><p>Locate the desired backup file, copy it to the leading
scheduler’s <code>-backup_dir</code> folder and stage
+recovery by running the following command on a leader
+<code>aurora_admin scheduler_stage_recovery --bypass-leader-redirect
<cluster> scheduler-backup-<yyyy-MM-dd-HH-mm></code></p></li>
+<li><p>At this point, the recovery snapshot is staged and available for manual
verification/modification
+via <code>aurora_admin scheduler_print_recovery_tasks
--bypass-leader-redirect</code> and
+<code>scheduler_delete_recovery_tasks --bypass-leader-redirect</code> commands.
+See <code>aurora_admin help <command></code> for usage details.</p></li>
+<li><p>Commit recovery. This instructs the scheduler to overwrite the existing
Mesos replicated log with
+the provided backup snapshot and initiate a mandatory failover
+<code>aurora_admin scheduler_commit_recovery --bypass-leader-redirect
<cluster></code></p></li>
+</ul>
+
+<h1 id="cleanup">Cleanup</h1>
+
+<p>Undo any modification done during <a href="#preparation">Preparation</a>
sequence.</p>
+
+</div>
+
+ </div>
+ </div>
+ <div class="container-fluid section-footer buffer">
+ <div class="container">
+ <div class="row">
+ <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3>
+ <ul>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/community/">Mailing Lists</a></li>
+ <li><a
href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li>
+ <li><a href="/documentation/latest/contributing/">How
To Contribute</a></li>
+ </ul>
+ </div>
+ <div class="col-md-2"><h3>The ASF</h3>
+ <ul>
+ <li><a href="http://www.apache.org/licenses/">License</a></li>
+ <li><a
href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
+ <li><a
href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/">Security</a></li>
+ </ul>
+ </div>
+ <div class="col-md-6">
+ <p class="disclaimer">Copyright 2014 <a
href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under
the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a
href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX
photo</a> displayed on the homepage is available under a <a
href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons
BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo
are trademarks of The Apache Software Foundation.</p>
+ </div>
+ </div>
+ </div>
+
+ </body>
+</html>
Added:
aurora/site/publish/documentation/0.13.0/operations/configuration/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/operations/configuration/index.html?rev=1739400&view=auto
==============================================================================
---
aurora/site/publish/documentation/0.13.0/operations/configuration/index.html
(added)
+++
aurora/site/publish/documentation/0.13.0/operations/configuration/index.html
Sat Apr 16 04:09:25 2016
@@ -0,0 +1,321 @@
+<!DOCTYPE html>
+<html lang="en">
+ <head>
+ <meta charset="utf-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <title>Apache Aurora</title>
+ <link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
+ <link href="/assets/css/main.css" rel="stylesheet">
+ <!-- Analytics -->
+ <script type="text/javascript">
+ var _gaq = _gaq || [];
+ _gaq.push(['_setAccount', 'UA-45879646-1']);
+ _gaq.push(['_setDomainName', 'apache.org']);
+ _gaq.push(['_trackPageview']);
+
+ (function() {
+ var ga = document.createElement('script'); ga.type =
'text/javascript'; ga.async = true;
+ ga.src = ('https:' == document.location.protocol ?
'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+ var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
+ })();
+ </script>
+ </head>
+ <body>
+ <div class="container-fluid section-header">
+ <div class="container">
+ <div class="nav nav-bar">
+ <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300"
alt="Transparent Apache Aurora logo with dark background"/></a>
+ <ul class="nav navbar-nav navbar-right">
+ <li><a href="/documentation/latest/">Documentation</a></li>
+ <li><a href="/community/">Community</a></li>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ </ul>
+ </div>
+ </div>
+</div>
+
+ <div class="container-fluid">
+ <div class="container content">
+ <div class="col-md-12 documentation">
+<h5 class="page-header text-uppercase">Documentation
+<select onChange="window.location.href='/documentation/' + this.value +
'/operations/configuration/'"
+ value="0.13.0">
+ <option value="0.13.0"
+ selected="selected">
+ 0.13.0
+ (latest)
+ </option>
+ <option value="0.12.0"
+ >
+ 0.12.0
+ </option>
+ <option value="0.11.0"
+ >
+ 0.11.0
+ </option>
+ <option value="0.10.0"
+ >
+ 0.10.0
+ </option>
+ <option value="0.9.0"
+ >
+ 0.9.0
+ </option>
+ <option value="0.8.0"
+ >
+ 0.8.0
+ </option>
+ <option value="0.7.0-incubating"
+ >
+ 0.7.0-incubating
+ </option>
+ <option value="0.6.0-incubating"
+ >
+ 0.6.0-incubating
+ </option>
+ <option value="0.5.0-incubating"
+ >
+ 0.5.0-incubating
+ </option>
+</select>
+</h5>
+<h1 id="scheduler-configuration">Scheduler Configuration</h1>
+
+<p>The Aurora scheduler can take a variety of configuration options through
command-line arguments.
+Examples are available under <code>examples/scheduler/</code>. For a list of
available Aurora flags and their
+documentation, see <a href="../reference/scheduler-configuration.md">Scheduler
Configuration Reference</a>.</p>
+
+<h2 id="a-note-on-configuration">A Note on Configuration</h2>
+
+<p>Like Mesos, Aurora uses command-line flags for runtime configuration. As
such the Aurora
+“configuration file” is typically a <code>scheduler.sh</code>
shell script of the form.</p>
+<pre class="highlight shell"><code><span style="color: #999988;font-style:
italic">#!/bin/bash</span>
+<span style="color: #008080">AURORA_HOME</span><span style="color:
#000000;font-weight: bold">=</span>/usr/local/aurora-scheduler
+
+<span style="color: #999988;font-style: italic"># Flags controlling the
JVM.</span>
+<span style="color: #008080">JAVA_OPTS</span><span style="color:
#000000;font-weight: bold">=(</span>
+ -Xmx2g
+ -Xms2g
+ <span style="color: #999988;font-style: italic"># GC tuning, etc.</span>
+<span style="color: #000000;font-weight: bold">)</span>
+
+<span style="color: #999988;font-style: italic"># Flags controlling the
scheduler.</span>
+<span style="color: #008080">AURORA_FLAGS</span><span style="color:
#000000;font-weight: bold">=(</span>
+ <span style="color: #999988;font-style: italic"># Port for client RPCs and
the web UI</span>
+ -http_port<span style="color: #000000;font-weight: bold">=</span>8081
+ <span style="color: #999988;font-style: italic"># Log configuration,
etc.</span>
+<span style="color: #000000;font-weight: bold">)</span>
+
+<span style="color: #999988;font-style: italic"># Environment variables
controlling libmesos</span>
+<span style="color: #0086B3">export </span><span style="color:
#008080">JAVA_HOME</span><span style="color: #000000;font-weight:
bold">=</span>...
+<span style="color: #0086B3">export </span><span style="color:
#008080">GLOG_v</span><span style="color: #000000;font-weight: bold">=</span>1
+<span style="color: #999988;font-style: italic"># Port used to communicate
with the Mesos master and for the replicated log</span>
+<span style="color: #0086B3">export </span><span style="color:
#008080">LIBPROCESS_PORT</span><span style="color: #000000;font-weight:
bold">=</span>8083
+
+<span style="color: #008080">JAVA_OPTS</span><span style="color:
#000000;font-weight: bold">=</span><span style="color: #d14">"</span><span
style="color: #000000;font-weight: bold">${</span><span style="color:
#008080">JAVA_OPTS</span><span style="background-color:
#f8f8f8">[*]</span><span style="color: #000000;font-weight: bold">}</span><span
style="color: #d14">"</span> <span style="color: #0086B3">exec</span> <span
style="color: #d14">"</span><span style="color:
#008080">$AURORA_HOME</span><span style="color:
#d14">/bin/aurora-scheduler"</span> <span style="color: #d14">"</span><span
style="color: #000000;font-weight: bold">${</span><span style="color:
#008080">AURORA_FLAGS</span><span style="background-color:
#f8f8f8">[@]</span><span style="color: #000000;font-weight: bold">}</span><span
style="color: #d14">"</span>
+</code></pre>
+
+<p>That way Aurora’s current flags are visible in <code>ps</code> and in
the <code>/vars</code> admin endpoint.</p>
+
+<h2 id="replicated-log-configuration">Replicated Log Configuration</h2>
+
+<p>Aurora schedulers use ZooKeeper to discover log replicas and elect a
leader. Only one scheduler is
+leader at a given time - the other schedulers follow log writes and prepare to
take over as leader
+but do not communicate with the Mesos master. Either 3 or 5 schedulers are
recommended in a
+production deployment depending on failure tolerance and they must have
persistent storage.</p>
+
+<p>Below is a summary of scheduler storage configuration flags that either
don’t have default values
+or require attention before deploying in a production environment.</p>
+
+<h3 id="native_log_quorum_size"><code>-native_log_quorum_size</code></h3>
+
+<p>Defines the Mesos replicated log quorum size. In a cluster with
<code>N</code> schedulers, the flag
+<code>-native_log_quorum_size</code> should be set to <code>floor(N/2) +
1</code>. So in a cluster with 1 scheduler
+it should be set to <code>1</code>, in a cluster with 3 it should be set to
<code>2</code>, and in a cluster of 5 it
+should be set to <code>3</code>.</p>
+
+<table><thead>
+<tr>
+<th>Number of schedulers (N)</th>
+<th><code>-native_log_quorum_size</code> setting (<code>floor(N/2) +
1</code>)</th>
+</tr>
+</thead><tbody>
+<tr>
+<td>1</td>
+<td>1</td>
+</tr>
+<tr>
+<td>3</td>
+<td>2</td>
+</tr>
+<tr>
+<td>5</td>
+<td>3</td>
+</tr>
+<tr>
+<td>7</td>
+<td>4</td>
+</tr>
+</tbody></table>
+
+<p><em>Incorrectly setting this flag will cause data corruption to
occur!</em></p>
+
+<h3 id="native_log_file_path"><code>-native_log_file_path</code></h3>
+
+<p>Location of the Mesos replicated log files. Consider allocating a dedicated
disk (preferably SSD)
+for Mesos replicated log files to ensure optimal storage performance.</p>
+
+<h3 id="native_log_zk_group_path"><code>-native_log_zk_group_path</code></h3>
+
+<p>ZooKeeper path used for Mesos replicated log quorum discovery.</p>
+
+<p>See <a
href="../../src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java">code</a>
for
+other available Mesos replicated log configuration options and default
values.</p>
+
+<h3 id="changing-the-quorum-size">Changing the Quorum Size</h3>
+
+<p>Special care needs to be taken when changing the size of the Aurora
scheduler quorum.
+Since Aurora uses a Mesos replicated log, similar steps need to be followed as
when
+<a
href="http://mesos.apache.org/documentation/latest/operational-guide">changing
the mesos quorum size</a>.</p>
+
+<p>As a preparation, increase <code>-native_log_quorum_size</code> on each
existing scheduler and restart them.
+When updating from 3 to 5 schedulers, the quorum size would grow from 2 to
3.</p>
+
+<p>When starting the new schedulers, use the
<code>-native_log_quorum_size</code> set to the new value. Failing to
+first increase the quorum size on running schedulers can in some cases result
in corruption
+or truncating of the replicated log used by Aurora. In that case, see the
documentation on
+<a href="backup-restore.md">recovering from backup</a>.</p>
+
+<h2 id="backup-configuration">Backup Configuration</h2>
+
+<p>Configuration options for the Aurora scheduler backup manager.</p>
+
+<h3 id="backup_interval"><code>-backup_interval</code></h3>
+
+<p>The interval on which the scheduler writes local storage backups. The
default is every hour.</p>
+
+<h3 id="backup_dir"><code>-backup_dir</code></h3>
+
+<p>Directory to write backups to.</p>
+
+<h3 id="max_saved_backups"><code>-max_saved_backups</code></h3>
+
+<p>Maximum number of backups to retain before deleting the oldest
backup(s).</p>
+
+<h2 id="process-logs">Process Logs</h2>
+
+<h3 id="log-destination">Log destination</h3>
+
+<p>By default, Thermos will write process stdout/stderr to log files in the
sandbox. Process object configuration
+allows specifying alternate log file destinations like streamed stdout/stderr
or suppression of all log output.
+Default behavior can be configured for the entire cluster with the following
flag (through the <code>-thermos_executor_flags</code>
+argument to the Aurora scheduler):</p>
+<pre class="highlight plaintext"><code>--runner-logger-destination=both
+</code></pre>
+
+<p><code>both</code> configuration will send logs to files and stream to
parent stdout/stderr outputs.</p>
+
+<p>See <a href="../reference/configuration.md#logger">Configuration
Reference</a> for all destination options.</p>
+
+<h3 id="log-rotation">Log rotation</h3>
+
+<p>By default, Thermos will not rotate the stdout/stderr logs from child
processes and they will grow
+without bound. An individual user may change this behavior via configuration
on the Process object,
+but it may also be desirable to change the default configuration for the
entire cluster.
+In order to enable rotation by default, the following flags can be applied to
Thermos (through the
+-thermos<em>executor</em>flags argument to the Aurora scheduler):</p>
+<pre class="highlight plaintext"><code>--runner-logger-mode=rotate
+--runner-rotate-log-size-mb=100
+--runner-rotate-log-backups=10
+</code></pre>
+
+<p>In the above example, each instance of the Thermos runner will rotate
stderr/stdout logs once they
+reach 100 MiB in size and keep a maximum of 10 backups. If a user has provided
a custom setting for
+their process, it will override these default settings.</p>
+
+<h2 id="thermos-executor-wrapper">Thermos Executor Wrapper</h2>
+
+<p>If you need to do computation before starting the thermos executor (for
example, setting a different
+<code>--announcer-hostname</code> parameter for every executor), then the
thermos executor should be invoked
+ inside a wrapper script. In such a case, the aurora scheduler should be
started with
+ <code>-thermos_executor_path</code> pointing to the wrapper script and
<code>-thermos_executor_resources</code>
+ set to a comma separated string of all the resources that should be copied
into
+ the sandbox (including the original thermos executor).</p>
+
+<p>For example, to wrap the executor inside a simple wrapper, the scheduler
will be started like this
+<code>-thermos_executor_path=/path/to/wrapper.sh
-thermos_executor_resources=/usr/share/aurora/bin/thermos_executor.pex</code></p>
+
+<h3 id="docker-containers">Docker containers</h3>
+
+<p>In order for Aurora to launch jobs using docker containers, a few extra
configuration options
+must be set. The <a
href="http://mesos.apache.org/documentation/latest/docker-containerizer/">docker
containerizer</a>
+must be enabled on the mesos slaves by launching them with the
<code>--containerizers=docker,mesos</code> option.</p>
+
+<p>By default, Aurora will configure Mesos to copy the file specified in
<code>-thermos_executor_path</code>
+into the container’s sandbox. If using a wrapper script to launch the
thermos executor,
+specify the path to the wrapper in that argument. In addition, the path to the
executor pex itself
+must be included in the <code>-thermos_executor_resources</code> option. Doing
so will ensure that both the
+wrapper script and executor are correctly copied into the sandbox. Finally,
ensure the wrapper
+script does not access resources outside of the sandbox, as when the script is
run from within a
+docker container those resources will not exist.</p>
+
+<p>A scheduler flag, <code>-global_container_mounts</code> allows mounting
paths from the host (i.e., the slave)
+into all containers on that host. The format is a comma separated list of
host<em>path:container</em>path[:mode]
+tuples. For example
<code>-global_container_mounts=/opt/secret_keys_dir:/mnt/secret_keys_dir:ro</code>
mounts
+<code>/opt/secret_keys_dir</code> from the slaves into all launched
containers. Valid modes are <code>ro</code> and <code>rw</code>.</p>
+
+<p>If you would like to run a container with a read-only filesystem, it may
also be necessary to
+pass to use the scheduler flag <code>-thermos_home_in_sandbox</code> in order
to set HOME to the sandbox
+before the executor runs. This will make sure that the executor/runner PEX
extractions happens
+inside of the sandbox instead of the container filesystem root.</p>
+
+<p>If you would like to supply your own parameters to <code>docker run</code>
when launching jobs in docker
+containers, you may use the following flags:</p>
+<pre class="highlight plaintext"><code>-allow_docker_parameters
+-default_docker_parameters
+</code></pre>
+
+<p><code>-allow_docker_parameters</code> controls whether or not users may
pass their own configuration parameters
+through the job configuration files. If set to <code>false</code> (the
default), the scheduler will reject
+jobs with custom parameters. <em>NOTE</em>: this setting should be used with
caution as it allows any job
+owner to specify any parameters they wish, including those that may introduce
security concerns
+(<code>privileged=true</code>, for example).</p>
+
+<p><code>-default_docker_parameters</code> allows a cluster operator to
specify a universal set of parameters that
+should be used for every container that does not have parameters explicitly
configured at the job
+level. The argument accepts a multimap format:</p>
+<pre class="highlight
plaintext"><code>-default_docker_parameters="read-only=true,tmpfs=/tmp,tmpfs=/run"
+</code></pre>
+
+</div>
+
+ </div>
+ </div>
+ <div class="container-fluid section-footer buffer">
+ <div class="container">
+ <div class="row">
+ <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3>
+ <ul>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/community/">Mailing Lists</a></li>
+ <li><a
href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li>
+ <li><a href="/documentation/latest/contributing/">How
To Contribute</a></li>
+ </ul>
+ </div>
+ <div class="col-md-2"><h3>The ASF</h3>
+ <ul>
+ <li><a href="http://www.apache.org/licenses/">License</a></li>
+ <li><a
href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
+ <li><a
href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/">Security</a></li>
+ </ul>
+ </div>
+ <div class="col-md-6">
+ <p class="disclaimer">Copyright 2014 <a
href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under
the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a
href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX
photo</a> displayed on the homepage is available under a <a
href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons
BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo
are trademarks of The Apache Software Foundation.</p>
+ </div>
+ </div>
+ </div>
+
+ </body>
+</html>
Added:
aurora/site/publish/documentation/0.13.0/operations/installation/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/operations/installation/index.html?rev=1739400&view=auto
==============================================================================
--- aurora/site/publish/documentation/0.13.0/operations/installation/index.html
(added)
+++ aurora/site/publish/documentation/0.13.0/operations/installation/index.html
Sat Apr 16 04:09:25 2016
@@ -0,0 +1,451 @@
+<!DOCTYPE html>
+<html lang="en">
+ <head>
+ <meta charset="utf-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <title>Apache Aurora</title>
+ <link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
+ <link href="/assets/css/main.css" rel="stylesheet">
+ <!-- Analytics -->
+ <script type="text/javascript">
+ var _gaq = _gaq || [];
+ _gaq.push(['_setAccount', 'UA-45879646-1']);
+ _gaq.push(['_setDomainName', 'apache.org']);
+ _gaq.push(['_trackPageview']);
+
+ (function() {
+ var ga = document.createElement('script'); ga.type =
'text/javascript'; ga.async = true;
+ ga.src = ('https:' == document.location.protocol ?
'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+ var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
+ })();
+ </script>
+ </head>
+ <body>
+ <div class="container-fluid section-header">
+ <div class="container">
+ <div class="nav nav-bar">
+ <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300"
alt="Transparent Apache Aurora logo with dark background"/></a>
+ <ul class="nav navbar-nav navbar-right">
+ <li><a href="/documentation/latest/">Documentation</a></li>
+ <li><a href="/community/">Community</a></li>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ </ul>
+ </div>
+ </div>
+</div>
+
+ <div class="container-fluid">
+ <div class="container content">
+ <div class="col-md-12 documentation">
+<h5 class="page-header text-uppercase">Documentation
+<select onChange="window.location.href='/documentation/' + this.value +
'/operations/installation/'"
+ value="0.13.0">
+ <option value="0.13.0"
+ selected="selected">
+ 0.13.0
+ (latest)
+ </option>
+ <option value="0.12.0"
+ >
+ 0.12.0
+ </option>
+ <option value="0.11.0"
+ >
+ 0.11.0
+ </option>
+ <option value="0.10.0"
+ >
+ 0.10.0
+ </option>
+ <option value="0.9.0"
+ >
+ 0.9.0
+ </option>
+ <option value="0.8.0"
+ >
+ 0.8.0
+ </option>
+ <option value="0.7.0-incubating"
+ >
+ 0.7.0-incubating
+ </option>
+ <option value="0.6.0-incubating"
+ >
+ 0.6.0-incubating
+ </option>
+ <option value="0.5.0-incubating"
+ >
+ 0.5.0-incubating
+ </option>
+</select>
+</h5>
+<h1 id="installing-aurora">Installing Aurora</h1>
+
+<p>Source and binary distributions can be found on our
+<a href="https://aurora.apache.org/downloads/">downloads</a> page. Installing
from binary packages is
+recommended for most.</p>
+
+<ul>
+<li><a href="#installing-the-scheduler">Installing the scheduler</a></li>
+<li><a href="#installing-worker-components">Installing worker
components</a></li>
+<li><a href="#installing-the-client">Installing the client</a></li>
+<li><a href="#installing-mesos">Installing Mesos</a></li>
+<li><a href="#troubleshooting">Troubleshooting</a></li>
+</ul>
+
+<p>If our binay packages don’t suite you, our package build toolchain
makes it easy to build your
+own packages. See the <a
href="https://github.com/apache/aurora-packaging">instructions</a> to learn
how.</p>
+
+<h2 id="machine-profiles">Machine profiles</h2>
+
+<p>Given that many of these components communicate over the network, there are
numerous ways you could
+assemble them to create an Aurora cluster. The simplest way is to think in
terms of three machine
+profiles:</p>
+
+<h3 id="coordinator">Coordinator</h3>
+
+<p><strong>Components</strong>: ZooKeeper, Aurora scheduler, Mesos master</p>
+
+<p>A small number of machines (typically 3 or 5) responsible for cluster
orchestration. In most cases
+it is fine to co-locate these components in anything but very large clusters
(> 1000 machines).
+Beyond that point, operators will likely want to manage these services on
separate machines.</p>
+
+<p>In practice, 5 coordinators have been shown to reliably manage clusters
with tens of thousands of
+machines.</p>
+
+<h3 id="worker">Worker</h3>
+
+<p><strong>Components</strong>: Aurora executor, Aurora observer, Mesos
agent</p>
+
+<p>The bulk of the cluster, where services will actually run.</p>
+
+<h3 id="client">Client</h3>
+
+<p><strong>Components</strong>: Aurora client, Aurora admin client</p>
+
+<p>Any machines that users submit jobs from.</p>
+
+<h2 id="installing-the-scheduler">Installing the scheduler</h2>
+
+<h3 id="ubuntu-trusty">Ubuntu Trusty</h3>
+
+<ol>
+<li><p>Install Mesos
+Skip down to <a href="#mesos-on-ubuntu-trusty">install mesos</a>, then run:</p>
+<pre class="highlight plaintext"><code>sudo start mesos-master
+</code></pre></li>
+<li><p>Install ZooKeeper</p>
+<pre class="highlight plaintext"><code>sudo apt-get install -y zookeeperd
+</code></pre></li>
+<li><p>Install the Aurora scheduler</p>
+<pre class="highlight plaintext"><code>sudo add-apt-repository -y
ppa:openjdk-r/ppa
+sudo apt-get update
+sudo apt-get install -y openjdk-8-jre-headless wget
+
+sudo update-alternatives --set java
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
+
+wget -c
https://apache.bintray.com/aurora/ubuntu-trusty/aurora-scheduler_0.12.0_amd64.deb
+sudo dpkg -i aurora-scheduler_0.12.0_amd64.deb
+</code></pre></li>
+</ol>
+
+<h3 id="centos-7">CentOS 7</h3>
+
+<ol>
+<li><p>Install Mesos
+Skip down to <a href="#mesos-on-centos-7">install mesos</a>, then run:</p>
+<pre class="highlight plaintext"><code>sudo systemctl start mesos-master
+</code></pre></li>
+<li><p>Install ZooKeeper</p>
+<pre class="highlight plaintext"><code>sudo rpm -Uvh
https://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm
+sudo yum install -y java-1.8.0-openjdk-headless zookeeper-server
+
+sudo service zookeeper-server init
+sudo systemctl start zookeeper-server
+</code></pre></li>
+<li><p>Install the Aurora scheduler</p>
+<pre class="highlight plaintext"><code>sudo yum install -y wget
+
+wget -c
https://apache.bintray.com/aurora/centos-7/aurora-scheduler-0.12.0-1.el7.centos.aurora.x86_64.rpm
+sudo yum install -y aurora-scheduler-0.12.0-1.el7.centos.aurora.x86_64.rpm
+</code></pre></li>
+</ol>
+
+<h3 id="finalizing">Finalizing</h3>
+
+<p>By default, the scheduler will start in an uninitialized mode. This is
because external
+coordination is necessary to be certain operator error does not result in a
quorum of schedulers
+starting up and believing their databases are empty when in fact they should
be re-joining a
+cluster.</p>
+
+<p>Because of this, a fresh install of the scheduler will need intervention to
start up. First,
+stop the scheduler service.
+Ubuntu: <code>sudo stop aurora-scheduler</code>
+CentOS: <code>sudo systemctl stop aurora</code></p>
+
+<p>Now initialize the database:</p>
+<pre class="highlight plaintext"><code>sudo -u aurora mkdir -p
/var/lib/aurora/scheduler/db
+sudo -u aurora mesos-log initialize --path=/var/lib/aurora/scheduler/db
+</code></pre>
+
+<p>Now you can start the scheduler back up.
+Ubuntu: <code>sudo start aurora-scheduler</code>
+CentOS: <code>sudo systemctl start aurora</code></p>
+
+<h2 id="installing-worker-components">Installing worker components</h2>
+
+<h3 id="ubuntu-trusty">Ubuntu Trusty</h3>
+
+<ol>
+<li><p>Install Mesos
+Skip down to <a href="#mesos-on-ubuntu-trusty">install mesos</a>, then run:</p>
+<pre class="highlight plaintext"><code>start mesos-slave
+</code></pre></li>
+<li><p>Install Aurora executor and observer</p>
+<pre class="highlight plaintext"><code>sudo apt-get install -y python2.7 wget
+
+# NOTE: This appears to be a missing dependency of the mesos deb package and
is needed
+# for the python mesos native bindings.
+sudo apt-get -y install libcurl4-nss-dev
+
+wget -c
https://apache.bintray.com/aurora/ubuntu-trusty/aurora-executor_0.12.0_amd64.deb
+sudo dpkg -i aurora-executor_0.12.0_amd64.deb
+</code></pre></li>
+</ol>
+
+<h3 id="centos-7">CentOS 7</h3>
+
+<ol>
+<li><p>Install Mesos
+Skip down to <a href="#mesos-on-centos-7">install mesos</a>, then run:</p>
+<pre class="highlight plaintext"><code>sudo systemctl start mesos-slave
+</code></pre></li>
+<li><p>Install Aurora executor and observer</p>
+<pre class="highlight plaintext"><code>sudo yum install -y python2 wget
+
+wget -c
https://apache.bintray.com/aurora/centos-7/aurora-executor-0.12.0-1.el7.centos.aurora.x86_64.rpm
+sudo yum install -y aurora-executor-0.12.0-1.el7.centos.aurora.x86_64.rpm
+</code></pre></li>
+</ol>
+
+<h3 id="configuration">Configuration</h3>
+
+<p>The executor typically does not require configuration. Command line
arguments can
+be passed to the executor using a command line argument on the scheduler.</p>
+
+<p>The observer needs to be configured to look at the correct mesos directory
in order to find task
+sandboxes. You should 1st find the Mesos working directory by looking for the
Mesos slave
+<code>--work_dir</code> flag. You should see something like:</p>
+<pre class="highlight plaintext"><code> ps -eocmd | grep "mesos-slave" |
grep -v grep | tr ' ' '\n' | grep "\--work_dir"
+ --work_dir=/var/lib/mesos
+</code></pre>
+
+<p>If the flag is not set, you can view the default value like so:</p>
+<pre class="highlight plaintext"><code> mesos-slave --help
+ Usage: mesos-slave [options]
+
+ ...
+ --work_dir=VALUE Directory path to place framework work directories
+ (default: /tmp/mesos)
+ ...
+</code></pre>
+
+<p>The value you find for <code>--work_dir</code>, <code>/var/lib/mesos</code>
in this example, should match the Aurora
+observer value for <code>--mesos-root</code>. You can look for that setting
in a similar way on a worker
+node by grepping for <code>thermos_observer</code> and
<code>--mesos-root</code>. If the flag is not set, you can view
+the default value like so:</p>
+<pre class="highlight plaintext"><code> thermos_observer -h
+ Options:
+ ...
+ --mesos-root=MESOS_ROOT
+ The mesos root directory to search for Thermos
+ executor sandboxes [default: /var/lib/mesos]
+ ...
+</code></pre>
+
+<p>In this case the default is <code>/var/lib/mesos</code> and we have a
match. If there is no match, you can
+either adjust the mesos-master start script(s) and restart the master(s) or
else adjust the
+Aurora observer start scripts and restart the observers. To adjust the Aurora
observer:</p>
+
+<h4 id="ubuntu-trusty">Ubuntu Trusty</h4>
+<pre class="highlight plaintext"><code>sudo sh -c 'echo
"MESOS_ROOT=/tmp/mesos" >> /etc/default/thermos'
+</code></pre>
+
+<p>NB: In Aurora releases up through 0.12.0, you’ll also need to edit
/etc/init/thermos.conf like so:</p>
+<pre class="highlight diff"><code><span style="color: #999999">diff -C 1
/etc/init/thermos.conf.orig /etc/init/thermos.conf
+</span>*** /etc/init/thermos.conf.orig 2016-03-22 22:34:46.286199718 +0000
+<span style="color: #000000;background-color: #ffdddd">---
/etc/init/thermos.conf 2016-03-22 17:09:49.357689038 +0000
+</span>***************
+*** 24,25 ****
+<span style="color: #000000;background-color: #ffdddd">--- 24,26 ----
+</span> --port=${OBSERVER_PORT:-1338} \
+<span style="color: #000000;background-color: #ddffdd">+
--mesos-root=${MESOS_ROOT:-/var/lib/mesos} \
+</span> --log_to_disk=NONE \
+</code></pre>
+
+<h4 id="centos-7">CentOS 7</h4>
+
+<p>Make an edit to add the <code>--mesos-root</code> flag resulting in
something like:</p>
+<pre class="highlight plaintext"><code>grep -A5 OBSERVER_ARGS
/etc/sysconfig/thermos-observer
+OBSERVER_ARGS=(
+ --port=1338
+ --mesos-root=/tmp/mesos
+ --log_to_disk=NONE
+ --log_to_stderr=google:INFO
+)
+</code></pre>
+
+<h2 id="installing-the-client">Installing the client</h2>
+
+<h3 id="ubuntu-trusty">Ubuntu Trusty</h3>
+<pre class="highlight plaintext"><code>sudo apt-get install -y python2.7 wget
+
+wget -c
https://apache.bintray.com/aurora/ubuntu-trusty/aurora-tools_0.12.0_amd64.deb
+sudo dpkg -i aurora-tools_0.12.0_amd64.deb
+</code></pre>
+
+<h3 id="centos-7">CentOS 7</h3>
+<pre class="highlight plaintext"><code>sudo yum install -y python2 wget
+
+wget -c
https://apache.bintray.com/aurora/centos-7/aurora-tools-0.12.0-1.el7.centos.aurora.x86_64.rpm
+sudo yum install -y aurora-tools-0.12.0-1.el7.centos.aurora.x86_64.rpm
+</code></pre>
+
+<h3 id="mac-os-x">Mac OS X</h3>
+<pre class="highlight plaintext"><code>brew upgrade
+brew install aurora-cli
+</code></pre>
+
+<h3 id="configuration">Configuration</h3>
+
+<p>Client configuration lives in a json file that describes the clusters
available and how to reach
+them. By default this file is at <code>/etc/aurora/clusters.json</code>.</p>
+
+<p>Jobs may be submitted to the scheduler using the client, and are described
with
+<a href="../reference/configuration.md">job configurations</a> expressed in
<code>.aurora</code> files. Typically you will
+maintain a single job configuration file to describe one or more deployment
environments (e.g.
+dev, test, prod) for a production job.</p>
+
+<h2 id="installing-mesos">Installing Mesos</h2>
+
+<p>Mesos uses a single package for the Mesos master and slave. As a result,
the package dependencies
+are identical for both.</p>
+
+<h3 id="mesos-on-ubuntu-trusty">Mesos on Ubuntu Trusty</h3>
+<pre class="highlight plaintext"><code>sudo apt-key adv --keyserver
keyserver.ubuntu.com --recv E56151BF
+DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')
+CODENAME=$(lsb_release -cs)
+
+echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \
+ sudo tee /etc/apt/sources.list.d/mesosphere.list
+sudo apt-get -y update
+
+# Use `apt-cache showpkg mesos | grep [version]` to find the exact version.
+sudo apt-get -y install mesos=0.25.0-0.2.70.ubuntu1404
+</code></pre>
+
+<h3 id="mesos-on-centos-7">Mesos on CentOS 7</h3>
+<pre class="highlight plaintext"><code>sudo rpm -Uvh
https://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm
+sudo yum -y install mesos-0.25.0
+</code></pre>
+
+<h2 id="troubleshooting">Troubleshooting</h2>
+
+<p>So you’ve started your first cluster and are running into some
issues? We’ve collected some common
+stumbling blocks and solutions here to help get you moving.</p>
+
+<h3 id="replicated-log-not-initialized">Replicated log not initialized</h3>
+
+<h4 id="symptoms">Symptoms</h4>
+
+<ul>
+<li>Scheduler RPCs and web interface claim <code>Storage is not
READY</code></li>
+<li>Scheduler log repeatedly prints messages like</li>
+</ul>
+<pre class="highlight plaintext"><code> I1016 16:12:27.234133 26081
replica.cpp:638] Replica in EMPTY status
+ received a broadcasted recover request
+ I1016 16:12:27.234256 26084 recover.cpp:188] Received a recover response
+ from a replica in EMPTY status
+</code></pre>
+
+<h4 id="solution">Solution</h4>
+
+<p>When you create a new cluster, you need to inform a quorum of schedulers
that they are safe to
+consider their database to be empty by <a href="#finalizing">initializing</a>
the
+replicated log. This is done to prevent the scheduler from modifying the
cluster state in the event
+of multiple simultaneous disk failures or, more likely, misconfiguration of
the replicated log path.</p>
+
+<h3 id="scheduler-not-registered">Scheduler not registered</h3>
+
+<h4 id="symptoms">Symptoms</h4>
+
+<p>Scheduler log contains</p>
+<pre class="highlight plaintext"><code>Framework has not been registered
within the tolerated delay.
+</code></pre>
+
+<h4 id="solution">Solution</h4>
+
+<p>Double-check that the scheduler is configured correctly to reach the Mesos
master. If you are registering
+the master in ZooKeeper, make sure command line argument to the master:</p>
+<pre class="highlight plaintext"><code>--zk=zk://$ZK_HOST:2181/mesos/master
+</code></pre>
+
+<p>is the same as the one on the scheduler:</p>
+<pre class="highlight
plaintext"><code>-mesos_master_address=zk://$ZK_HOST:2181/mesos/master
+</code></pre>
+
+<h3 id="scheduler-not-running">Scheduler not running</h3>
+
+<h3 id="symptom">Symptom</h3>
+
+<p>The scheduler process commits suicide regularly. This happens under error
conditions, but
+also on purpose in regular intervals.</p>
+
+<h2 id="solution">Solution</h2>
+
+<p>Aurora is meant to be run under supervision. You have to configure a
supervisor like
+<a href="http://mmonit.com/monit/">Monit</a> or <a
href="http://supervisord.org/">supervisord</a> to run the scheduler
+and restart it whenever it fails or exists on purpose.</p>
+
+<p>Aurora supports an active health checking protocol on its admin HTTP
interface - if a <code>GET /health</code>
+times out or returns anything other than <code>200 OK</code> the scheduler
process is unhealthy and should be
+restarted.</p>
+
+<p>For example, monit can be configured with</p>
+<pre class="highlight plaintext"><code>if failed port 8081 send "GET /health
HTTP/1.0\r\n" expect "OK\n" with timeout 2 seconds for 10 cycles then restart
+</code></pre>
+
+<p>assuming you set <code>-http_port=8081</code>.</p>
+
+</div>
+
+ </div>
+ </div>
+ <div class="container-fluid section-footer buffer">
+ <div class="container">
+ <div class="row">
+ <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3>
+ <ul>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/community/">Mailing Lists</a></li>
+ <li><a
href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li>
+ <li><a href="/documentation/latest/contributing/">How
To Contribute</a></li>
+ </ul>
+ </div>
+ <div class="col-md-2"><h3>The ASF</h3>
+ <ul>
+ <li><a href="http://www.apache.org/licenses/">License</a></li>
+ <li><a
href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
+ <li><a
href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/">Security</a></li>
+ </ul>
+ </div>
+ <div class="col-md-6">
+ <p class="disclaimer">Copyright 2014 <a
href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under
the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a
href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX
photo</a> displayed on the homepage is available under a <a
href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons
BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo
are trademarks of The Apache Software Foundation.</p>
+ </div>
+ </div>
+ </div>
+
+ </body>
+</html>
Added: aurora/site/publish/documentation/0.13.0/operations/monitoring/index.html
URL:
http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.13.0/operations/monitoring/index.html?rev=1739400&view=auto
==============================================================================
--- aurora/site/publish/documentation/0.13.0/operations/monitoring/index.html
(added)
+++ aurora/site/publish/documentation/0.13.0/operations/monitoring/index.html
Sat Apr 16 04:09:25 2016
@@ -0,0 +1,309 @@
+<!DOCTYPE html>
+<html lang="en">
+ <head>
+ <meta charset="utf-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <title>Apache Aurora</title>
+ <link rel="stylesheet"
href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
+ <link href="/assets/css/main.css" rel="stylesheet">
+ <!-- Analytics -->
+ <script type="text/javascript">
+ var _gaq = _gaq || [];
+ _gaq.push(['_setAccount', 'UA-45879646-1']);
+ _gaq.push(['_setDomainName', 'apache.org']);
+ _gaq.push(['_trackPageview']);
+
+ (function() {
+ var ga = document.createElement('script'); ga.type =
'text/javascript'; ga.async = true;
+ ga.src = ('https:' == document.location.protocol ?
'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+ var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
+ })();
+ </script>
+ </head>
+ <body>
+ <div class="container-fluid section-header">
+ <div class="container">
+ <div class="nav nav-bar">
+ <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300"
alt="Transparent Apache Aurora logo with dark background"/></a>
+ <ul class="nav navbar-nav navbar-right">
+ <li><a href="/documentation/latest/">Documentation</a></li>
+ <li><a href="/community/">Community</a></li>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/blog/">Blog</a></li>
+ </ul>
+ </div>
+ </div>
+</div>
+
+ <div class="container-fluid">
+ <div class="container content">
+ <div class="col-md-12 documentation">
+<h5 class="page-header text-uppercase">Documentation
+<select onChange="window.location.href='/documentation/' + this.value +
'/operations/monitoring/'"
+ value="0.13.0">
+ <option value="0.13.0"
+ selected="selected">
+ 0.13.0
+ (latest)
+ </option>
+ <option value="0.12.0"
+ >
+ 0.12.0
+ </option>
+ <option value="0.11.0"
+ >
+ 0.11.0
+ </option>
+ <option value="0.10.0"
+ >
+ 0.10.0
+ </option>
+ <option value="0.9.0"
+ >
+ 0.9.0
+ </option>
+ <option value="0.8.0"
+ >
+ 0.8.0
+ </option>
+ <option value="0.7.0-incubating"
+ >
+ 0.7.0-incubating
+ </option>
+ <option value="0.6.0-incubating"
+ >
+ 0.6.0-incubating
+ </option>
+ <option value="0.5.0-incubating"
+ >
+ 0.5.0-incubating
+ </option>
+</select>
+</h5>
+<h1 id="monitoring-your-aurora-cluster">Monitoring your Aurora cluster</h1>
+
+<p>Before you start running important services in your Aurora cluster,
it’s important to set up
+monitoring and alerting of Aurora itself. Most of your monitoring can be
against the scheduler,
+since it will give you a global view of what’s going on.</p>
+
+<h2 id="reading-stats">Reading stats</h2>
+
+<p>The scheduler exposes a <em>lot</em> of instrumentation data via its HTTP
interface. You can get a quick
+peek at the first few of these in our vagrant image:</p>
+<pre class="highlight plaintext"><code>$ vagrant ssh -c 'curl -s
localhost:8081/vars | head'
+async_tasks_completed 1004
+attribute_store_fetch_all_events 15
+attribute_store_fetch_all_events_per_sec 0.0
+attribute_store_fetch_all_nanos_per_event 0.0
+attribute_store_fetch_all_nanos_total 3048285
+attribute_store_fetch_all_nanos_total_per_sec 0.0
+attribute_store_fetch_one_events 3391
+attribute_store_fetch_one_events_per_sec 0.0
+attribute_store_fetch_one_nanos_per_event 0.0
+attribute_store_fetch_one_nanos_total 454690753
+</code></pre>
+
+<p>These values are served as <code>Content-Type: text/plain</code>, with each
line containing a space-separated metric
+name and value. Values may be integers, doubles, or strings (note: strings are
static, others
+may be dynamic).</p>
+
+<p>If your monitoring infrastructure prefers JSON, the scheduler exports that
as well:</p>
+<pre class="highlight plaintext"><code>$ vagrant ssh -c 'curl -s
localhost:8081/vars.json | python -mjson.tool | head'
+{
+ "async_tasks_completed": 1009,
+ "attribute_store_fetch_all_events": 15,
+ "attribute_store_fetch_all_events_per_sec": 0.0,
+ "attribute_store_fetch_all_nanos_per_event": 0.0,
+ "attribute_store_fetch_all_nanos_total": 3048285,
+ "attribute_store_fetch_all_nanos_total_per_sec": 0.0,
+ "attribute_store_fetch_one_events": 3409,
+ "attribute_store_fetch_one_events_per_sec": 0.0,
+ "attribute_store_fetch_one_nanos_per_event": 0.0,
+</code></pre>
+
+<p>This will be the same data as above, served with <code>Content-Type:
application/json</code>.</p>
+
+<h2 id="viewing-live-stat-samples-on-the-scheduler">Viewing live stat samples
on the scheduler</h2>
+
+<p>The scheduler uses the Twitter commons stats library, which keeps an
internal time-series database
+of exported variables - nearly everything in <code>/vars</code> is available
for instant graphing. This is
+useful for debugging, but is not a replacement for an external monitoring
system.</p>
+
+<p>You can view these graphs on a scheduler at <code>/graphview</code>. It
supports some composition and
+aggregation of values, which can be invaluable when triaging a problem. For
example, if you have
+the scheduler running in vagrant, check out these links:
+<a href="http://192.168.33.7:8081/graphview?query=jvm_uptime_secs">simple
graph</a>
+<a
href="http://192.168.33.7:8081/graphview?query=rate(scheduler_log_native_append_nanos_total)%2Frate(scheduler_log_native_append_events)%2F1e6">complex
composition</a></p>
+
+<h3 id="counters-and-gauges">Counters and gauges</h3>
+
+<p>Among numeric stats, there are two fundamental types of stats exported:
<em>counters</em> and <em>gauges</em>.
+Counters are guaranteed to be monotonically-increasing for the lifetime of a
process, while gauges
+may decrease in value. Aurora uses counters to represent things like the
number of times an event
+has occurred, and gauges to capture things like the current length of a queue.
Counters are a
+natural fit for accurate composition into <a
href="http://en.wikipedia.org/wiki/Rate_ratio">rate ratios</a>
+(useful for sample-resistant latency calculation), while gauges are not.</p>
+
+<h1 id="alerting">Alerting</h1>
+
+<h2 id="quickstart">Quickstart</h2>
+
+<p>If you are looking for just bare-minimum alerting to get something in place
quickly, set up alerting
+on <code>framework_registered</code> and <code>task_store_LOST</code>. These
will give you a decent picture of overall
+health.</p>
+
+<h2 id="a-note-on-thresholds">A note on thresholds</h2>
+
+<p>One of the most difficult things in monitoring is choosing alert
thresholds. With many of these
+stats, there is no value we can offer as a threshold that will be guaranteed
to work for you. It
+will depend on the size of your cluster, number of jobs, churn of tasks in the
cluster, etc. We
+recommend you start with a strict value after viewing a small amount of
collected data, and then
+adjust thresholds as you see fit. Feel free to ask us if you would like to
validate that your alerts
+and thresholds make sense.</p>
+
+<h2 id="important-stats">Important stats</h2>
+
+<h3 id="jvm_uptime_secs"><code>jvm_uptime_secs</code></h3>
+
+<p>Type: integer counter</p>
+
+<p>The number of seconds the JVM process has been running. Comes from
+<a
href="http://docs.oracle.com/javase/7/docs/api/java/lang/management/RuntimeMXBean.html#getUptime()">RuntimeMXBean#getUptime()</a></p>
+
+<p>Detecting resets (decreasing values) on this stat will tell you that the
scheduler is failing to
+stay alive.</p>
+
+<p>Look at the scheduler logs to identify the reason the scheduler is
exiting.</p>
+
+<h3 id="system_load_avg"><code>system_load_avg</code></h3>
+
+<p>Type: double gauge</p>
+
+<p>The current load average of the system for the last minute. Comes from
+<a
href="http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html?is-external=true#getSystemLoadAverage()">OperatingSystemMXBean#getSystemLoadAverage()</a>.</p>
+
+<p>A high sustained value suggests that the scheduler machine may be
over-utilized.</p>
+
+<p>Use standard unix tools like <code>top</code> and <code>ps</code> to track
down the offending process(es).</p>
+
+<h3
id="process_cpu_cores_utilized"><code>process_cpu_cores_utilized</code></h3>
+
+<p>Type: double gauge</p>
+
+<p>The current number of CPU cores in use by the JVM process. This should not
exceed the number of
+logical CPU cores on the machine. Derived from
+<a
href="http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html">OperatingSystemMXBean#getProcessCpuTime()</a></p>
+
+<p>A high sustained value indicates that the scheduler is overworked. Due to
current internal design
+limitations, if this value is sustained at <code>1</code>, there is a good
chance the scheduler is under water.</p>
+
+<p>There are two main inputs that tend to drive this figure: task scheduling
attempts and status
+updates from Mesos. You may see activity in the scheduler logs to give an
indication of where
+time is being spent. Beyond that, it really takes good familiarity with the
code to effectively
+triage this. We suggest engaging with an Aurora developer.</p>
+
+<h3 id="task_store_lost"><code>task_store_LOST</code></h3>
+
+<p>Type: integer gauge</p>
+
+<p>The number of tasks stored in the scheduler that are in the
<code>LOST</code> state, and have been rescheduled.</p>
+
+<p>If this value is increasing at a high rate, it is a sign of trouble.</p>
+
+<p>There are many sources of <code>LOST</code> tasks in Mesos: the scheduler,
master, slave, and executor can all
+trigger this. The first step is to look in the scheduler logs for
<code>LOST</code> to identify where the
+state changes are originating.</p>
+
+<h3 id="scheduler_resource_offers"><code>scheduler_resource_offers</code></h3>
+
+<p>Type: integer counter</p>
+
+<p>The number of resource offers that the scheduler has received.</p>
+
+<p>For a healthy scheduler, this value must be increasing over time.</p>
+
+<p>Assuming the scheduler is up and otherwise healthy, you will want to check
if the master thinks it
+is sending offers. You should also look at the master’s web interface to
see if it has a large
+number of outstanding offers that it is waiting to be returned.</p>
+
+<h3 id="framework_registered"><code>framework_registered</code></h3>
+
+<p>Type: binary integer counter</p>
+
+<p>Will be <code>1</code> for the leading scheduler that is registered with
the Mesos master, <code>0</code> for passive
+schedulers,</p>
+
+<p>A sustained period without a <code>1</code> (or where <code>sum() !=
1</code>) warrants investigation.</p>
+
+<p>If there is no leading scheduler, look in the scheduler and master logs for
why. If there are
+multiple schedulers claiming leadership, this suggests a split brain and
warrants filing a critical
+bug.</p>
+
+<h3
id="rate-scheduler_log_native_append_nanos_total-rate-scheduler_log_native_append_events"><code>rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)</code></h3>
+
+<p>Type: rate ratio of integer counters</p>
+
+<p>This composes two counters to compute a windowed figure for the latency of
replicated log writes.</p>
+
+<p>A hike in this value suggests disk bandwidth contention.</p>
+
+<p>Look in scheduler logs for any reported oddness with saving to the
replicated log. Also use
+standard tools like <code>vmstat</code> and <code>iotop</code> to identify
whether the disk has become slow or
+over-utilized. We suggest using a dedicated disk for the replicated log to
mitigate this.</p>
+
+<h3 id="timed_out_tasks"><code>timed_out_tasks</code></h3>
+
+<p>Type: integer counter</p>
+
+<p>Tracks the number of times the scheduler has given up while waiting
+(for <code>-transient_task_state_timeout</code>) to hear back about a task
that is in a transient state
+(e.g. <code>ASSIGNED</code>, <code>KILLING</code>), and has moved to
<code>LOST</code> before rescheduling.</p>
+
+<p>This value is currently known to increase occasionally when the scheduler
fails over
+(<a href="https://issues.apache.org/jira/browse/AURORA-740">AURORA-740</a>).
However, any large spike in this
+value warrants investigation.</p>
+
+<p>The scheduler will log when it times out a task. You should trace the task
ID of the timed out
+task into the master, slave, and/or executors to determine where the message
was dropped.</p>
+
+<h3 id="http_500_responses_events"><code>http_500_responses_events</code></h3>
+
+<p>Type: integer counter</p>
+
+<p>The total number of HTTP 500 status responses sent by the scheduler.
Includes API and asset serving.</p>
+
+<p>An increase warrants investigation.</p>
+
+<p>Look in scheduler logs to identify why the scheduler returned a 500, there
should be a stack trace.</p>
+
+</div>
+
+ </div>
+ </div>
+ <div class="container-fluid section-footer buffer">
+ <div class="container">
+ <div class="row">
+ <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3>
+ <ul>
+ <li><a href="/downloads/">Downloads</a></li>
+ <li><a href="/community/">Mailing Lists</a></li>
+ <li><a
href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li>
+ <li><a href="/documentation/latest/contributing/">How
To Contribute</a></li>
+ </ul>
+ </div>
+ <div class="col-md-2"><h3>The ASF</h3>
+ <ul>
+ <li><a href="http://www.apache.org/licenses/">License</a></li>
+ <li><a
href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
+ <li><a
href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+ <li><a href="http://www.apache.org/security/">Security</a></li>
+ </ul>
+ </div>
+ <div class="col-md-6">
+ <p class="disclaimer">Copyright 2014 <a
href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under
the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a
href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX
photo</a> displayed on the homepage is available under a <a
href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons
BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo
are trademarks of The Apache Software Foundation.</p>
+ </div>
+ </div>
+ </div>
+
+ </body>
+</html>