Added: aurora/site/publish/documentation/0.21.0/operations/configuration/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.21.0/operations/configuration/index.html?rev=1840515&view=auto ============================================================================== --- aurora/site/publish/documentation/0.21.0/operations/configuration/index.html (added) +++ aurora/site/publish/documentation/0.21.0/operations/configuration/index.html Tue Sep 11 05:28:10 2018 @@ -0,0 +1,557 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <title>Apache Aurora</title> + <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css"> + <link href="/assets/css/main.css" rel="stylesheet"> + <!-- Analytics --> + <script type="text/javascript"> + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-45879646-1']); + _gaq.push(['_setDomainName', 'apache.org']); + _gaq.push(['_trackPageview']); + + (function() { + var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; + ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; + var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + </script> + </head> + <body> + <div class="container-fluid section-header"> + <div class="container"> + <div class="nav nav-bar"> + <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300" alt="Transparent Apache Aurora logo with dark background"/></a> + <ul class="nav navbar-nav navbar-right"> + <li><a href="/documentation/latest/">Documentation</a></li> + <li><a href="/community/">Community</a></li> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/blog/">Blog</a></li> + </ul> + </div> + </div> +</div> + + <div class="container-fluid"> + <div class="container content"> + <div class="col-md-12 documentation"> +<h5 class="page-header text-uppercase">Documentation +<select onChange="window.location.href='/documentation/' + this.value + '/operations/configuration/'" + value="0.21.0"> + <option value="0.21.0" + selected="selected"> + 0.21.0 + (latest) + </option> + <option value="0.20.0" + > + 0.20.0 + </option> + <option value="0.19.1" + > + 0.19.1 + </option> + <option value="0.19.0" + > + 0.19.0 + </option> + <option value="0.18.1" + > + 0.18.1 + </option> + <option value="0.18.0" + > + 0.18.0 + </option> + <option value="0.17.0" + > + 0.17.0 + </option> + <option value="0.16.0" + > + 0.16.0 + </option> + <option value="0.15.0" + > + 0.15.0 + </option> + <option value="0.14.0" + > + 0.14.0 + </option> + <option value="0.13.0" + > + 0.13.0 + </option> + <option value="0.12.0" + > + 0.12.0 + </option> + <option value="0.11.0" + > + 0.11.0 + </option> + <option value="0.10.0" + > + 0.10.0 + </option> + <option value="0.9.0" + > + 0.9.0 + </option> + <option value="0.8.0" + > + 0.8.0 + </option> + <option value="0.7.0-incubating" + > + 0.7.0-incubating + </option> + <option value="0.6.0-incubating" + > + 0.6.0-incubating + </option> + <option value="0.5.0-incubating" + > + 0.5.0-incubating + </option> +</select> +</h5> +<h1 id="scheduler-configuration">Scheduler Configuration</h1> + +<p>The Aurora scheduler can take a variety of configuration options through command-line arguments. +Examples are available under <code>examples/scheduler/</code>. For a list of available Aurora flags and their +documentation, see <a href="../../reference/scheduler-configuration/">Scheduler Configuration Reference</a>.</p> + +<h2 id="a-note-on-configuration">A Note on Configuration</h2> + +<p>Like Mesos, Aurora uses command-line flags for runtime configuration. As such the Aurora +“configuration file” is typically a <code>scheduler.sh</code> shell script of the form.</p> +<pre class="highlight shell"><code><span style="color: #999988;font-style: italic">#!/bin/bash</span> +<span style="color: #008080">AURORA_HOME</span><span style="color: #000000;font-weight: bold">=</span>/usr/local/aurora-scheduler + +<span style="color: #999988;font-style: italic"># Flags controlling the JVM.</span> +<span style="color: #008080">JAVA_OPTS</span><span style="color: #000000;font-weight: bold">=(</span> + -Xmx2g + -Xms2g + <span style="color: #999988;font-style: italic"># GC tuning, etc.</span> +<span style="color: #000000;font-weight: bold">)</span> + +<span style="color: #999988;font-style: italic"># Flags controlling the scheduler.</span> +<span style="color: #008080">AURORA_FLAGS</span><span style="color: #000000;font-weight: bold">=(</span> + <span style="color: #999988;font-style: italic"># Port for client RPCs and the web UI</span> + -http_port<span style="color: #000000;font-weight: bold">=</span>8081 + <span style="color: #999988;font-style: italic"># Log configuration, etc.</span> +<span style="color: #000000;font-weight: bold">)</span> + +<span style="color: #999988;font-style: italic"># Environment variables controlling libmesos</span> +<span style="color: #0086B3">export </span><span style="color: #008080">JAVA_HOME</span><span style="color: #000000;font-weight: bold">=</span>... +<span style="color: #0086B3">export </span><span style="color: #008080">GLOG_v</span><span style="color: #000000;font-weight: bold">=</span>1 +<span style="color: #0086B3">export </span><span style="color: #008080">LIBPROCESS_PORT</span><span style="color: #000000;font-weight: bold">=</span>8083 +<span style="color: #0086B3">export </span><span style="color: #008080">LIBPROCESS_IP</span><span style="color: #000000;font-weight: bold">=</span>192.168.33.7 + +<span style="color: #008080">JAVA_OPTS</span><span style="color: #000000;font-weight: bold">=</span><span style="color: #d14">"</span><span style="color: #000000;font-weight: bold">${</span><span style="color: #008080">JAVA_OPTS</span><span style="background-color: #f8f8f8">[*]</span><span style="color: #000000;font-weight: bold">}</span><span style="color: #d14">"</span> <span style="color: #0086B3">exec</span> <span style="color: #d14">"</span><span style="color: #008080">$AURORA_HOME</span><span style="color: #d14">/bin/aurora-scheduler"</span> <span style="color: #d14">"</span><span style="color: #000000;font-weight: bold">${</span><span style="color: #008080">AURORA_FLAGS</span><span style="background-color: #f8f8f8">[@]</span><span style="color: #000000;font-weight: bold">}</span><span style="color: #d14">"</span> +</code></pre> + +<p>That way Aurora’s current flags are visible in <code>ps</code> and in the <code>/vars</code> admin endpoint.</p> + +<h2 id="jvm-configuration">JVM Configuration</h2> + +<p>JVM settings are dependent on your environment and cluster size. They might require +custom tuning. As a starting point, we recommend:</p> + +<ul> +<li>Ensure the initial (<code>-Xms</code>) and maximum (<code>-Xmx</code>) heap size are idential to prevent heap resizing +at runtime.</li> +<li>Either <code>-XX:+UseConcMarkSweepGC</code> or <code>-XX:+UseG1GC -XX:+UseStringDeduplication</code> are +sane defaults for the garbage collector.</li> +<li><code>-Djava.net.preferIPv4Stack=true</code> makes sense in most cases as well.</li> +</ul> + +<h2 id="network-configuration">Network Configuration</h2> + +<p>By default, Aurora binds to all interfaces and auto-discovers its hostname. To reduce ambiguity +it helps to hardcode them though:</p> +<pre class="highlight plaintext"><code>-http_port=8081 +-ip=192.168.33.7 +-hostname="aurora1.us-east1.example.org" +</code></pre> + +<p>Two environment variables control the ip and port for the communication with the Mesos master +and for the replicated log used by Aurora:</p> +<pre class="highlight plaintext"><code>export LIBPROCESS_PORT=8083 +export LIBPROCESS_IP=192.168.33.7 +</code></pre> + +<p>It is important that those can be reached from all Mesos master and Aurora scheduler instances.</p> + +<h2 id="replicated-log-configuration">Replicated Log Configuration</h2> + +<p>Aurora schedulers use ZooKeeper to discover log replicas and elect a leader. Only one scheduler is +leader at a given time - the other schedulers follow log writes and prepare to take over as leader +but do not communicate with the Mesos master. Either 3 or 5 schedulers are recommended in a +production deployment depending on failure tolerance and they must have persistent storage.</p> + +<p>Below is a summary of scheduler storage configuration flags that either don’t have default values +or require attention before deploying in a production environment.</p> + +<h3 id="native_log_quorum_size"><code>-native_log_quorum_size</code></h3> + +<p>Defines the Mesos replicated log quorum size. In a cluster with <code>N</code> schedulers, the flag +<code>-native_log_quorum_size</code> should be set to <code>floor(N/2) + 1</code>. So in a cluster with 1 scheduler +it should be set to <code>1</code>, in a cluster with 3 it should be set to <code>2</code>, and in a cluster of 5 it +should be set to <code>3</code>.</p> + +<table><thead> +<tr> +<th>Number of schedulers (N)</th> +<th><code>-native_log_quorum_size</code> setting (<code>floor(N/2) + 1</code>)</th> +</tr> +</thead><tbody> +<tr> +<td>1</td> +<td>1</td> +</tr> +<tr> +<td>3</td> +<td>2</td> +</tr> +<tr> +<td>5</td> +<td>3</td> +</tr> +<tr> +<td>7</td> +<td>4</td> +</tr> +</tbody></table> + +<p><em>Incorrectly setting this flag will cause data corruption to occur!</em></p> + +<h3 id="native_log_file_path"><code>-native_log_file_path</code></h3> + +<p>Location of the Mesos replicated log files. For optimal and consistent performance, consider +allocating a dedicated disk (preferably SSD) for the replicated log. Ensure that this disk is not +used by anything else (e.g. no process logging) and in particular that it is a real disk +and not just a partition.</p> + +<p>Even when a dedicated disk is used, switching from <code>CFQ</code> to <code>deadline</code> I/O scheduler of Linux kernel +can furthermore help with storage performance in Aurora (<a href="https://issues.apache.org/jira/browse/AURORA-1211">see this ticket for details</a>).</p> + +<h3 id="native_log_zk_group_path"><code>-native_log_zk_group_path</code></h3> + +<p>ZooKeeper path used for Mesos replicated log quorum discovery.</p> + +<p>See <a href="https://github.com/apache/aurora/blob/rel/0.21.0/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java">code</a> for +other available Mesos replicated log configuration options and default values.</p> + +<h3 id="changing-the-quorum-size">Changing the Quorum Size</h3> + +<p>Special care needs to be taken when changing the size of the Aurora scheduler quorum. +Since Aurora uses a Mesos replicated log, similar steps need to be followed as when +<a href="http://mesos.apache.org/documentation/latest/operational-guide">changing the Mesos quorum size</a>.</p> + +<p>As a preparation, increase <code>-native_log_quorum_size</code> on each existing scheduler and restart them. +When updating from 3 to 5 schedulers, the quorum size would grow from 2 to 3.</p> + +<p>When starting the new schedulers, use the <code>-native_log_quorum_size</code> set to the new value. Failing to +first increase the quorum size on running schedulers can in some cases result in corruption +or truncating of the replicated log used by Aurora. In that case, see the documentation on +<a href="../backup-restore/">recovering from backup</a>.</p> + +<h2 id="backup-configuration">Backup Configuration</h2> + +<p>Configuration options for the Aurora scheduler backup manager.</p> + +<ul> +<li><code>-backup_interval</code>: The interval on which the scheduler writes local storage backups. +The default is every hour.</li> +<li><code>-backup_dir</code>: Directory to write backups to. As stated above, this should not be co-located on the +same disk as the replicated log.</li> +<li><code>-max_saved_backups</code>: Maximum number of backups to retain before deleting the oldest backup(s).</li> +</ul> + +<h2 id="resource-isolation">Resource Isolation</h2> + +<p>For proper CPU, memory, and disk isolation as mentioned in our <a href="../../features/resource-isolation/">enduser documentation</a>, +we recommend to add the following isolators to the <code>--isolation</code> flag of the Mesos agent:</p> + +<ul> +<li><code>cgroups/cpu</code></li> +<li><code>cgroups/mem</code></li> +<li><code>disk/du</code></li> +</ul> + +<p>In addition, we recommend to set the following <a href="http://mesos.apache.org/documentation/latest/configuration/">agent flags</a>:</p> + +<ul> +<li><code>--cgroups_limit_swap</code> to enable memory limits on both memory and swap instead of just memory. +Alternatively, you could disable swap on your agent hosts.</li> +<li><code>--cgroups_enable_cfs</code> to enable hard limits on CPU resources via the CFS bandwidth limiting +feature.</li> +<li><code>--enforce_container_disk_quota</code> to enable disk quota enforcement for containers.</li> +</ul> + +<p>To enable the optional GPU support in Mesos, please see the GPU related flags in the +<a href="http://mesos.apache.org/documentation/latest/configuration/">Mesos configuration</a>. +To enable the corresponding feature in Aurora, you have to start the scheduler with the +flag</p> +<pre class="highlight plaintext"><code>-allow_gpu_resource=true +</code></pre> + +<p>If you want to use revocable resources, first follow the +<a href="http://mesos.apache.org/documentation/latest/oversubscription/">Mesos oversubscription documentation</a> +and then set set this Aurora scheduler flag to allow receiving revocable Mesos offers:</p> +<pre class="highlight plaintext"><code>-receive_revocable_resources=true +</code></pre> + +<p>Both CPUs and RAM are supported as revocable resources. The former is enabled by the default, +the latter needs to be enabled via:</p> +<pre class="highlight plaintext"><code>-enable_revocable_ram=true +</code></pre> + +<p>Unless you want to use the <a href="https://github.com/apache/aurora/blob/rel/0.21.0/src/main/resources/org/apache/aurora/scheduler/tiers.json">default</a> +tier configuration, you will also have to specify a file path:</p> +<pre class="highlight plaintext"><code>-tier_config=path/to/tiers/config.json +</code></pre> + +<h2 id="multi-framework-setup">Multi-Framework Setup</h2> + +<p>Aurora holds onto Mesos offers in order to provide efficient scheduling and +<a href="../../features/multitenancy/#preemption">preemption</a>. This is problematic in multi-framework +environments as Aurora might starve other frameworks.</p> + +<p>With a downside of increased scheduling latency, Aurora can be configured to be more cooperative:</p> + +<ul> +<li>Lowering <code>-min_offer_hold_time</code> (e.g. to <code>1mins</code>) can ensure unused offers are returned back to +Mesos more frequently.</li> +<li>Increasing <code>-offer_filter_duration</code> (e.g to <code>30secs</code>) will instruct Mesos +not to re-offer rejected resources for the given duration.</li> +</ul> + +<p>Setting a <a href="http://mesos.apache.org/documentation/latest/quota/">minimum amount of resources</a> for +each Mesos role can furthermore help to ensure no framework is starved entirely.</p> + +<h2 id="containers">Containers</h2> + +<p>Both the Mesos and Docker containerizers require configuration of the Mesos agent.</p> + +<h3 id="mesos-containerizer">Mesos Containerizer</h3> + +<p>The minimal agent configuration requires to enable Docker and Appc image support for the Mesos +containerizer:</p> +<pre class="highlight plaintext"><code>--containerizers=mesos +--image_providers=appc,docker +--isolation=filesystem/linux,docker/runtime # as an addition to your other isolators +</code></pre> + +<p>Further details can be found in the corresponding <a href="http://mesos.apache.org/documentation/latest/container-image/">Mesos documentation</a>.</p> + +<h3 id="docker-containerizer">Docker Containerizer</h3> + +<p>The <a href="http://mesos.apache.org/documentation/latest/docker-containerizer/">Docker containerizer</a> +requires the Docker engine is installed on each agent host. In addition, it must be enabled on the +Mesos agents by launching them with the option:</p> +<pre class="highlight plaintext"><code>--containerizers=mesos,docker +</code></pre> + +<p>If you would like to run a container with a read-only filesystem, it may also be necessary to use +the scheduler flag <code>-thermos_home_in_sandbox</code> in order to set HOME to the sandbox +before the executor runs. This will make sure that the executor/runner PEX extractions happens +inside of the sandbox instead of the container filesystem root.</p> + +<p>If you would like to supply your own parameters to <code>docker run</code> when launching jobs in docker +containers, you may use the following flags:</p> +<pre class="highlight plaintext"><code>-allow_docker_parameters +-default_docker_parameters +</code></pre> + +<p><code>-allow_docker_parameters</code> controls whether or not users may pass their own configuration parameters +through the job configuration files. If set to <code>false</code> (the default), the scheduler will reject +jobs with custom parameters. <em>NOTE</em>: this setting should be used with caution as it allows any job +owner to specify any parameters they wish, including those that may introduce security concerns +(<code>privileged=true</code>, for example).</p> + +<p><code>-default_docker_parameters</code> allows a cluster operator to specify a universal set of parameters that +should be used for every container that does not have parameters explicitly configured at the job +level. The argument accepts a multimap format:</p> +<pre class="highlight plaintext"><code>-default_docker_parameters="read-only=true,tmpfs=/tmp,tmpfs=/run" +</code></pre> + +<h3 id="common-options">Common Options</h3> + +<p>The following Aurora options work for both containerizers.</p> + +<p>A scheduler flag, <code>-global_container_mounts</code> allows mounting paths from the host (i.e the agent machine) +into all containers on that host. The format is a comma separated list of host<em>path:container</em>path[:mode] +tuples. For example <code>-global_container_mounts=/opt/secret_keys_dir:/mnt/secret_keys_dir:ro</code> mounts +<code>/opt/secret_keys_dir</code> from the agents into all launched containers. Valid modes are <code>ro</code> and <code>rw</code>.</p> + +<h2 id="thermos-process-logs">Thermos Process Logs</h2> + +<h3 id="log-destination">Log destination</h3> + +<p>By default, Thermos will write process stdout/stderr to log files in the sandbox. Process object +configuration allows specifying alternate log file destinations like streamed stdout/stderr or +suppression of all log output. Default behavior can be configured for the entire cluster with the +following flag (through the <code>-thermos_executor_flags</code> argument to the Aurora scheduler):</p> +<pre class="highlight plaintext"><code>--runner-logger-destination=both +</code></pre> + +<p><code>both</code> configuration will send logs to files and stream to parent stdout/stderr outputs.</p> + +<p>See <a href="../../reference/configuration/#logger">Configuration Reference</a> for all destination options.</p> + +<h3 id="log-rotation">Log rotation</h3> + +<p>By default, Thermos will not rotate the stdout/stderr logs from child processes and they will grow +without bound. An individual user may change this behavior via configuration on the Process object, +but it may also be desirable to change the default configuration for the entire cluster. +In order to enable rotation by default, the following flags can be applied to Thermos (through the +<code>-thermos_executor_flags</code> argument to the Aurora scheduler):</p> +<pre class="highlight plaintext"><code>--runner-logger-mode=rotate +--runner-rotate-log-size-mb=100 +--runner-rotate-log-backups=10 +</code></pre> + +<p>In the above example, each instance of the Thermos runner will rotate stderr/stdout logs once they +reach 100 MiB in size and keep a maximum of 10 backups. If a user has provided a custom setting for +their process, it will override these default settings.</p> + +<h2 id="thermos-executor-wrapper">Thermos Executor Wrapper</h2> + +<p>If you need to do computation before starting the Thermos executor (for example, setting a different +<code>--announcer-hostname</code> parameter for every executor), then the Thermos executor should be invoked +inside a wrapper script. In such a case, the aurora scheduler should be started with +<code>-thermos_executor_path</code> pointing to the wrapper script and <code>-thermos_executor_resources</code> set to a +comma separated string of all the resources that should be copied into the sandbox (including the +original Thermos executor). Ensure the wrapper script does not access resources outside of the +sandbox, as when the script is run from within a Docker container those resources may not exist.</p> + +<p>For example, to wrap the executor inside a simple wrapper, the scheduler will be started like this +<code>-thermos_executor_path=/path/to/wrapper.sh -thermos_executor_resources=/usr/share/aurora/bin/thermos_executor.pex</code></p> + +<h2 id="custom-executors">Custom Executors</h2> + +<p>The scheduler can be configured to utilize a custom executor by specifying the <code>-custom_executor_config</code> flag. +The flag must be set to the path of a valid executor configuration file.</p> + +<p>For more information on this feature please see the custom executors <a href="../../features/custom-executors/">documentation</a>.</p> + +<h2 id="a-note-on-increasing-executor-overhead">A note on increasing executor overhead</h2> + +<p>Increasing executor overhead on an existing cluster, whether it be for custom executors or for Thermos, +will result in degraded preemption performance until all task which began life with the previous +executor configuration with less overhead are preempted/restarted.</p> + +<h2 id="controlling-mtta-via-update-affinity">Controlling MTTA via Update Affinity</h2> + +<p>When there is high resource contention in your cluster you may experience noticably elevated job update +times, as well as high task churn across the cluster. This is due to Aurora’s first-fit scheduling +algorithm. To alleviate this, you can enable update affinity where the Scheduler will make a best-effort +attempt to reuse the same agent for the updated task (so long as the resources for the job are not being +increased).</p> + +<p>To enable this in the Scheduler, you can set the following options:</p> +<pre class="highlight plaintext"><code>-enable_update_affinity=true +-update_affinity_reservation_hold_time=3mins +</code></pre> + +<p>You will need to tune the hold time to match the behavior you see in your cluster. If you have extremely +high update throughput, you might have to extend it as processing updates could easily add significant +delays between scheduling attempts. You may also have to tune scheduling parameters to achieve the +throughput you need in your cluster. Some relevant settings (with defaults) are:</p> +<pre class="highlight plaintext"><code>-max_schedule_attempts_per_sec=40 +-initial_schedule_penalty=1secs +-max_schedule_penalty=1mins +-scheduling_max_batch_size=3 +-max_tasks_per_schedule_attempt=5 +</code></pre> + +<p>There are metrics exposed by the Scheduler which can provide guidance on where the bottleneck is. +Example metrics to look at:</p> +<pre class="highlight plaintext"><code>- schedule_attempts_blocks (if this number is greater than 0, then task throughput is hitting + limits controlled by --max_scheduler_attempts_per_sec) +- scheduled_task_penalty_* (metrics around scheduling penalties for tasks, if the numbers here are high + then you could have high contention for resources) +</code></pre> + +<p>Most likely you’ll run into limits with the number of update instances that can be processed per minute +before you run into any other limits. So if your total work done per minute starts to exceed 2k instances, +you may need to extend the update<em>affinity</em>reservation<em>hold</em>time.</p> + +<h2 id="cluster-maintenance">Cluster Maintenance</h2> + +<p>Aurora performs maintenance related task drains. One of the scheduler options that can control +how often the scheduler polls for maintenance work can be controlled via,</p> +<pre class="highlight plaintext"><code>-host_maintenance_polling_interval=1min +</code></pre> + +<h2 id="enforcing-sla-limitations">Enforcing SLA limitations</h2> + +<p>Since tasks can specify their own <code>SLAPolicy</code>, the cluster needs to limit these SLA requirements. +Too aggressive a requirement can permanently block any type of maintenance work +(ex: OS/Kernel/Security upgrades) on a host and hold it hostage.</p> + +<p>An operator can control the limits for SLA requirements via these scheduler configuration options:</p> +<pre class="highlight plaintext"><code>-max_sla_duration_secs=2hrs +-min_required_instances_for_sla_check=20 +</code></pre> + +<p><em>Note: These limits only apply for <code>CountSlaPolicy</code> and <code>PercentageSlaPolicy</code>.</em></p> + +<h3 id="limiting-coordinator-sla">Limiting Coordinator SLA</h3> + +<p>With <code>CoordinatorSlaPolicy</code> the SLA calculation is off-loaded to an external HTTP service. Some +relevant scheduler configuration options are,</p> +<pre class="highlight plaintext"><code>-sla_coordinator_timeout=1min +-max_parallel_coordinated_maintenance=10 +</code></pre> + +<p>Since handing off the SLA calculation to an external service can potentially block maintenance +on hosts for an indefinite amount of time (either due to a mis-configured coordinator or due to +a valid degraded service). In those situations the following metrics will be helpful to identify the +offending tasks.</p> +<pre class="highlight plaintext"><code>sla_coordinator_user_errors_* (counter tracking number of times the coordinator for the task + returned a bad response.) +sla_coordinator_errors_* (counter tracking number of times the scheduler was not able + to communicate with the coordinator of the task.) +sla_coordinator_lock_starvation_* (counter tracking number of times the scheduler was not able to + get the lock for the coordinator of the task.) +</code></pre> + +</div> + + </div> + </div> + <div class="container-fluid section-footer buffer"> + <div class="container"> + <div class="row"> + <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3> + <ul> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/community/">Mailing Lists</a></li> + <li><a href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li> + <li><a href="/documentation/latest/contributing/">How To Contribute</a></li> + </ul> + </div> + <div class="col-md-2"><h3>The ASF</h3> + <ul> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-md-6"> + <p class="disclaimer">© 2014-2017 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX photo</a> displayed on the homepage is available under a <a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + + </body> +</html>
Added: aurora/site/publish/documentation/0.21.0/operations/installation/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.21.0/operations/installation/index.html?rev=1840515&view=auto ============================================================================== --- aurora/site/publish/documentation/0.21.0/operations/installation/index.html (added) +++ aurora/site/publish/documentation/0.21.0/operations/installation/index.html Tue Sep 11 05:28:10 2018 @@ -0,0 +1,418 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <title>Apache Aurora</title> + <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css"> + <link href="/assets/css/main.css" rel="stylesheet"> + <!-- Analytics --> + <script type="text/javascript"> + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-45879646-1']); + _gaq.push(['_setDomainName', 'apache.org']); + _gaq.push(['_trackPageview']); + + (function() { + var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; + ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; + var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + </script> + </head> + <body> + <div class="container-fluid section-header"> + <div class="container"> + <div class="nav nav-bar"> + <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300" alt="Transparent Apache Aurora logo with dark background"/></a> + <ul class="nav navbar-nav navbar-right"> + <li><a href="/documentation/latest/">Documentation</a></li> + <li><a href="/community/">Community</a></li> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/blog/">Blog</a></li> + </ul> + </div> + </div> +</div> + + <div class="container-fluid"> + <div class="container content"> + <div class="col-md-12 documentation"> +<h5 class="page-header text-uppercase">Documentation +<select onChange="window.location.href='/documentation/' + this.value + '/operations/installation/'" + value="0.21.0"> + <option value="0.21.0" + selected="selected"> + 0.21.0 + (latest) + </option> + <option value="0.20.0" + > + 0.20.0 + </option> + <option value="0.19.1" + > + 0.19.1 + </option> + <option value="0.19.0" + > + 0.19.0 + </option> + <option value="0.18.1" + > + 0.18.1 + </option> + <option value="0.18.0" + > + 0.18.0 + </option> + <option value="0.17.0" + > + 0.17.0 + </option> + <option value="0.16.0" + > + 0.16.0 + </option> + <option value="0.15.0" + > + 0.15.0 + </option> + <option value="0.14.0" + > + 0.14.0 + </option> + <option value="0.13.0" + > + 0.13.0 + </option> + <option value="0.12.0" + > + 0.12.0 + </option> + <option value="0.11.0" + > + 0.11.0 + </option> + <option value="0.10.0" + > + 0.10.0 + </option> + <option value="0.9.0" + > + 0.9.0 + </option> + <option value="0.8.0" + > + 0.8.0 + </option> + <option value="0.7.0-incubating" + > + 0.7.0-incubating + </option> + <option value="0.6.0-incubating" + > + 0.6.0-incubating + </option> + <option value="0.5.0-incubating" + > + 0.5.0-incubating + </option> +</select> +</h5> +<h1 id="installing-aurora">Installing Aurora</h1> + +<p>Source and binary distributions can be found on our +<a href="https://aurora.apache.org/downloads/">downloads</a> page. Installing from binary packages is +recommended for most.</p> + +<ul> +<li><a href="#installing-the-scheduler">Installing the scheduler</a></li> +<li><a href="#installing-worker-components">Installing worker components</a></li> +<li><a href="#installing-the-client">Installing the client</a></li> +<li><a href="#installing-mesos">Installing Mesos</a></li> +<li><a href="#troubleshooting">Troubleshooting</a></li> +</ul> + +<p>If our binay packages don’t suite you, our package build toolchain makes it easy to build your +own packages. See the <a href="https://github.com/apache/aurora-packaging">instructions</a> to learn how.</p> + +<h2 id="machine-profiles">Machine profiles</h2> + +<p>Given that many of these components communicate over the network, there are numerous ways you could +assemble them to create an Aurora cluster. The simplest way is to think in terms of three machine +profiles:</p> + +<h3 id="coordinator">Coordinator</h3> + +<p><strong>Components</strong>: ZooKeeper, Aurora scheduler, Mesos master</p> + +<p>A small number of machines (typically 3 or 5) responsible for cluster orchestration. In most cases +it is fine to co-locate these components in anything but very large clusters (> 1000 machines). +Beyond that point, operators will likely want to manage these services on separate machines. +In particular, you will want to use separate ZooKeeper ensembles for leader election and +service discovery. Otherwise a service discovery error or outage can take down the entire cluster.</p> + +<p>In practice, 5 coordinators have been shown to reliably manage clusters with tens of thousands of +machines.</p> + +<h3 id="worker">Worker</h3> + +<p><strong>Components</strong>: Aurora executor, Aurora observer, Mesos agent</p> + +<p>The bulk of the cluster, where services will actually run.</p> + +<h3 id="client">Client</h3> + +<p><strong>Components</strong>: Aurora client, Aurora admin client</p> + +<p>Any machines that users submit jobs from.</p> + +<h2 id="installing-the-scheduler">Installing the scheduler</h2> + +<h3 id="ubuntu-trusty">Ubuntu Trusty</h3> + +<ol> +<li><p>Install Mesos +Skip down to <a href="#mesos-on-ubuntu-trusty">install mesos</a>, then run:</p> +<pre class="highlight plaintext"><code>sudo start mesos-master +</code></pre></li> +<li><p>Install ZooKeeper</p> +<pre class="highlight plaintext"><code>sudo apt-get install -y zookeeperd +</code></pre></li> +<li><p>Install the Aurora scheduler</p> +<pre class="highlight plaintext"><code>sudo add-apt-repository -y ppa:openjdk-r/ppa +sudo apt-get update +sudo apt-get install -y openjdk-8-jre-headless wget + +sudo update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java + +wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-scheduler_0.17.0_amd64.deb +sudo dpkg -i aurora-scheduler_0.17.0_amd64.deb +</code></pre></li> +</ol> + +<h3 id="centos-7">CentOS 7</h3> + +<ol> +<li><p>Install Mesos +Skip down to <a href="#mesos-on-centos-7">install mesos</a>, then run:</p> +<pre class="highlight plaintext"><code>sudo systemctl start mesos-master +</code></pre></li> +<li><p>Install ZooKeeper</p> +<pre class="highlight plaintext"><code>sudo rpm -Uvh https://archive.cloudera.com/cdh4/one-click-install/redhat/6/x86_64/cloudera-cdh-4-0.x86_64.rpm +sudo yum install -y java-1.8.0-openjdk-headless zookeeper-server + +sudo service zookeeper-server init +sudo systemctl start zookeeper-server +</code></pre></li> +<li><p>Install the Aurora scheduler</p> +<pre class="highlight plaintext"><code>sudo yum install -y wget + +wget -c https://apache.bintray.com/aurora/centos-7/aurora-scheduler-0.17.0-1.el7.centos.aurora.x86_64.rpm +sudo yum install -y aurora-scheduler-0.17.0-1.el7.centos.aurora.x86_64.rpm +</code></pre></li> +</ol> + +<h3 id="finalizing">Finalizing</h3> + +<p>By default, the scheduler will start in an uninitialized mode. This is because external +coordination is necessary to be certain operator error does not result in a quorum of schedulers +starting up and believing their databases are empty when in fact they should be re-joining a +cluster.</p> + +<p>Because of this, a fresh install of the scheduler will need intervention to start up. First, +stop the scheduler service. +Ubuntu: <code>sudo stop aurora-scheduler</code> +CentOS: <code>sudo systemctl stop aurora</code></p> + +<p>Now initialize the database:</p> +<pre class="highlight plaintext"><code>sudo -u aurora mkdir -p /var/lib/aurora/scheduler/db +sudo -u aurora mesos-log initialize --path=/var/lib/aurora/scheduler/db +</code></pre> + +<p>Now you can start the scheduler back up. +Ubuntu: <code>sudo start aurora-scheduler</code> +CentOS: <code>sudo systemctl start aurora</code></p> + +<h2 id="installing-worker-components">Installing worker components</h2> + +<h3 id="ubuntu-trusty">Ubuntu Trusty</h3> + +<ol> +<li><p>Install Mesos +Skip down to <a href="#mesos-on-ubuntu-trusty">install mesos</a>, then run:</p> +<pre class="highlight plaintext"><code>start mesos-slave +</code></pre></li> +<li><p>Install Aurora executor and observer</p> +<pre class="highlight plaintext"><code>sudo apt-get install -y python2.7 wget + +# NOTE: This appears to be a missing dependency of the mesos deb package and is needed +# for the python mesos native bindings. +sudo apt-get -y install libcurl4-nss-dev + +wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-executor_0.17.0_amd64.deb +sudo dpkg -i aurora-executor_0.17.0_amd64.deb +</code></pre></li> +</ol> + +<h3 id="centos-7">CentOS 7</h3> + +<ol> +<li><p>Install Mesos +Skip down to <a href="#mesos-on-centos-7">install mesos</a>, then run:</p> +<pre class="highlight plaintext"><code>sudo systemctl start mesos-slave +</code></pre></li> +<li><p>Install Aurora executor and observer</p> +<pre class="highlight plaintext"><code>sudo yum install -y python2 wget + +wget -c https://apache.bintray.com/aurora/centos-7/aurora-executor-0.17.0-1.el7.centos.aurora.x86_64.rpm +sudo yum install -y aurora-executor-0.17.0-1.el7.centos.aurora.x86_64.rpm +</code></pre></li> +</ol> + +<h3 id="worker-configuration">Worker Configuration</h3> + +<p>The executor typically does not require configuration. Command line arguments can +be passed to the executor using a command line argument on the scheduler.</p> + +<p>The observer needs to be configured to look at the correct mesos directory in order to find task +sandboxes. You should 1st find the Mesos working directory by looking for the Mesos agent +<code>--work_dir</code> flag. You should see something like:</p> +<pre class="highlight plaintext"><code> ps -eocmd | grep "mesos-slave" | grep -v grep | tr ' ' '\n' | grep "\--work_dir" + --work_dir=/var/lib/mesos +</code></pre> + +<p>If the flag is not set, you can view the default value like so:</p> +<pre class="highlight plaintext"><code> mesos-slave --help + Usage: mesos-slave [options] + + ... + --work_dir=VALUE Directory path to place framework work directories + (default: /tmp/mesos) + ... +</code></pre> + +<p>The value you find for <code>--work_dir</code>, <code>/var/lib/mesos</code> in this example, should match the Aurora +observer value for <code>--mesos-root</code>. You can look for that setting in a similar way on a worker +node by grepping for <code>thermos_observer</code> and <code>--mesos-root</code>. If the flag is not set, you can view +the default value like so:</p> +<pre class="highlight plaintext"><code> thermos_observer -h + Options: + ... + --mesos-root=MESOS_ROOT + The mesos root directory to search for Thermos + executor sandboxes [default: /var/lib/mesos] + ... +</code></pre> + +<p>In this case the default is <code>/var/lib/mesos</code> and we have a match. If there is no match, you can +either adjust the mesos-master start script(s) and restart the master(s) or else adjust the +Aurora observer start scripts and restart the observers. To adjust the Aurora observer:</p> + +<h4 id="ubuntu-trusty">Ubuntu Trusty</h4> +<pre class="highlight plaintext"><code>sudo sh -c 'echo "MESOS_ROOT=/tmp/mesos" >> /etc/default/thermos' +</code></pre> + +<h4 id="centos-7">CentOS 7</h4> + +<p>Make an edit to add the <code>--mesos-root</code> flag resulting in something like:</p> +<pre class="highlight plaintext"><code>grep -A5 OBSERVER_ARGS /etc/sysconfig/thermos +OBSERVER_ARGS=( + --port=1338 + --mesos-root=/tmp/mesos + --log_to_disk=NONE + --log_to_stderr=google:INFO +) +</code></pre> + +<h2 id="installing-the-client">Installing the client</h2> + +<h3 id="ubuntu-trusty">Ubuntu Trusty</h3> +<pre class="highlight plaintext"><code>sudo apt-get install -y python2.7 wget + +wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-tools_0.17.0_amd64.deb +sudo dpkg -i aurora-tools_0.17.0_amd64.deb +</code></pre> + +<h3 id="centos-7">CentOS 7</h3> +<pre class="highlight plaintext"><code>sudo yum install -y python2 wget + +wget -c https://apache.bintray.com/aurora/centos-7/aurora-tools-0.17.0-1.el7.centos.aurora.x86_64.rpm +sudo yum install -y aurora-tools-0.17.0-1.el7.centos.aurora.x86_64.rpm +</code></pre> + +<h3 id="mac-os-x">Mac OS X</h3> +<pre class="highlight plaintext"><code>brew upgrade +brew install aurora-cli +</code></pre> + +<h3 id="client-configuration">Client Configuration</h3> + +<p>Client configuration lives in a json file that describes the clusters available and how to reach +them. By default this file is at <code>/etc/aurora/clusters.json</code>.</p> + +<p>Jobs may be submitted to the scheduler using the client, and are described with +<a href="../../reference/configuration/">job configurations</a> expressed in <code>.aurora</code> files. Typically you will +maintain a single job configuration file to describe one or more deployment environments (e.g. +dev, test, prod) for a production job.</p> + +<h2 id="installing-mesos">Installing Mesos</h2> + +<p>Mesos uses a single package for the Mesos master and agent. As a result, the package dependencies +are identical for both.</p> + +<h3 id="mesos-on-ubuntu-trusty">Mesos on Ubuntu Trusty</h3> +<pre class="highlight plaintext"><code>sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF +DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]') +CODENAME=$(lsb_release -cs) + +echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \ + sudo tee /etc/apt/sources.list.d/mesosphere.list +sudo apt-get -y update + +# Use `apt-cache showpkg mesos | grep [version]` to find the exact version. +sudo apt-get -y install mesos=1.1.0-2.0.107.ubuntu1404_amd64.deb +</code></pre> + +<h3 id="mesos-on-centos-7">Mesos on CentOS 7</h3> +<pre class="highlight plaintext"><code>sudo rpm -Uvh https://repos.mesosphere.io/el/7/noarch/RPMS/mesosphere-el-repo-7-1.noarch.rpm +sudo yum -y install mesos-1.1.0 +</code></pre> + +<h2 id="troubleshooting">Troubleshooting</h2> + +<p>So you’ve started your first cluster and are running into some issues? We’ve collected some common +stumbling blocks and solutions in our <a href="../troubleshooting/">Troubleshooting guide</a> to help get you moving.</p> + +</div> + + </div> + </div> + <div class="container-fluid section-footer buffer"> + <div class="container"> + <div class="row"> + <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3> + <ul> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/community/">Mailing Lists</a></li> + <li><a href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li> + <li><a href="/documentation/latest/contributing/">How To Contribute</a></li> + </ul> + </div> + <div class="col-md-2"><h3>The ASF</h3> + <ul> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-md-6"> + <p class="disclaimer">© 2014-2017 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX photo</a> displayed on the homepage is available under a <a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + + </body> +</html> Added: aurora/site/publish/documentation/0.21.0/operations/monitoring/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.21.0/operations/monitoring/index.html?rev=1840515&view=auto ============================================================================== --- aurora/site/publish/documentation/0.21.0/operations/monitoring/index.html (added) +++ aurora/site/publish/documentation/0.21.0/operations/monitoring/index.html Tue Sep 11 05:28:10 2018 @@ -0,0 +1,349 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <title>Apache Aurora</title> + <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css"> + <link href="/assets/css/main.css" rel="stylesheet"> + <!-- Analytics --> + <script type="text/javascript"> + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-45879646-1']); + _gaq.push(['_setDomainName', 'apache.org']); + _gaq.push(['_trackPageview']); + + (function() { + var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; + ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; + var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + </script> + </head> + <body> + <div class="container-fluid section-header"> + <div class="container"> + <div class="nav nav-bar"> + <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300" alt="Transparent Apache Aurora logo with dark background"/></a> + <ul class="nav navbar-nav navbar-right"> + <li><a href="/documentation/latest/">Documentation</a></li> + <li><a href="/community/">Community</a></li> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/blog/">Blog</a></li> + </ul> + </div> + </div> +</div> + + <div class="container-fluid"> + <div class="container content"> + <div class="col-md-12 documentation"> +<h5 class="page-header text-uppercase">Documentation +<select onChange="window.location.href='/documentation/' + this.value + '/operations/monitoring/'" + value="0.21.0"> + <option value="0.21.0" + selected="selected"> + 0.21.0 + (latest) + </option> + <option value="0.20.0" + > + 0.20.0 + </option> + <option value="0.19.1" + > + 0.19.1 + </option> + <option value="0.19.0" + > + 0.19.0 + </option> + <option value="0.18.1" + > + 0.18.1 + </option> + <option value="0.18.0" + > + 0.18.0 + </option> + <option value="0.17.0" + > + 0.17.0 + </option> + <option value="0.16.0" + > + 0.16.0 + </option> + <option value="0.15.0" + > + 0.15.0 + </option> + <option value="0.14.0" + > + 0.14.0 + </option> + <option value="0.13.0" + > + 0.13.0 + </option> + <option value="0.12.0" + > + 0.12.0 + </option> + <option value="0.11.0" + > + 0.11.0 + </option> + <option value="0.10.0" + > + 0.10.0 + </option> + <option value="0.9.0" + > + 0.9.0 + </option> + <option value="0.8.0" + > + 0.8.0 + </option> + <option value="0.7.0-incubating" + > + 0.7.0-incubating + </option> + <option value="0.6.0-incubating" + > + 0.6.0-incubating + </option> + <option value="0.5.0-incubating" + > + 0.5.0-incubating + </option> +</select> +</h5> +<h1 id="monitoring-your-aurora-cluster">Monitoring your Aurora cluster</h1> + +<p>Before you start running important services in your Aurora cluster, it’s important to set up +monitoring and alerting of Aurora itself. Most of your monitoring can be against the scheduler, +since it will give you a global view of what’s going on.</p> + +<h2 id="reading-stats">Reading stats</h2> + +<p>The scheduler exposes a <em>lot</em> of instrumentation data via its HTTP interface. You can get a quick +peek at the first few of these in our vagrant image:</p> +<pre class="highlight plaintext"><code>$ vagrant ssh -c 'curl -s localhost:8081/vars | head' +async_tasks_completed 1004 +attribute_store_fetch_all_events 15 +attribute_store_fetch_all_events_per_sec 0.0 +attribute_store_fetch_all_nanos_per_event 0.0 +attribute_store_fetch_all_nanos_total 3048285 +attribute_store_fetch_all_nanos_total_per_sec 0.0 +attribute_store_fetch_one_events 3391 +attribute_store_fetch_one_events_per_sec 0.0 +attribute_store_fetch_one_nanos_per_event 0.0 +attribute_store_fetch_one_nanos_total 454690753 +</code></pre> + +<p>These values are served as <code>Content-Type: text/plain</code>, with each line containing a space-separated metric +name and value. Values may be integers, doubles, or strings (note: strings are static, others +may be dynamic).</p> + +<p>If your monitoring infrastructure prefers JSON, the scheduler exports that as well:</p> +<pre class="highlight plaintext"><code>$ vagrant ssh -c 'curl -s localhost:8081/vars.json | python -mjson.tool | head' +{ + "async_tasks_completed": 1009, + "attribute_store_fetch_all_events": 15, + "attribute_store_fetch_all_events_per_sec": 0.0, + "attribute_store_fetch_all_nanos_per_event": 0.0, + "attribute_store_fetch_all_nanos_total": 3048285, + "attribute_store_fetch_all_nanos_total_per_sec": 0.0, + "attribute_store_fetch_one_events": 3409, + "attribute_store_fetch_one_events_per_sec": 0.0, + "attribute_store_fetch_one_nanos_per_event": 0.0, +</code></pre> + +<p>This will be the same data as above, served with <code>Content-Type: application/json</code>.</p> + +<h2 id="viewing-live-stat-samples-on-the-scheduler">Viewing live stat samples on the scheduler</h2> + +<p>The scheduler uses the Twitter commons stats library, which keeps an internal time-series database +of exported variables - nearly everything in <code>/vars</code> is available for instant graphing. This is +useful for debugging, but is not a replacement for an external monitoring system.</p> + +<p>You can view these graphs on a scheduler at <code>/graphview</code>. It supports some composition and +aggregation of values, which can be invaluable when triaging a problem. For example, if you have +the scheduler running in vagrant, check out these links: +<a href="http://192.168.33.7:8081/graphview?query=jvm_uptime_secs">simple graph</a> +<a href="http://192.168.33.7:8081/graphview?query=rate(scheduler_log_native_append_nanos_total)%2Frate(scheduler_log_native_append_events)%2F1e6">complex composition</a></p> + +<h3 id="counters-and-gauges">Counters and gauges</h3> + +<p>Among numeric stats, there are two fundamental types of stats exported: <em>counters</em> and <em>gauges</em>. +Counters are guaranteed to be monotonically-increasing for the lifetime of a process, while gauges +may decrease in value. Aurora uses counters to represent things like the number of times an event +has occurred, and gauges to capture things like the current length of a queue. Counters are a +natural fit for accurate composition into <a href="http://en.wikipedia.org/wiki/Rate_ratio">rate ratios</a> +(useful for sample-resistant latency calculation), while gauges are not.</p> + +<h1 id="alerting">Alerting</h1> + +<h2 id="quickstart">Quickstart</h2> + +<p>If you are looking for just bare-minimum alerting to get something in place quickly, set up alerting +on <code>framework_registered</code> and <code>task_store_LOST</code>. These will give you a decent picture of overall +health.</p> + +<h2 id="a-note-on-thresholds">A note on thresholds</h2> + +<p>One of the most difficult things in monitoring is choosing alert thresholds. With many of these +stats, there is no value we can offer as a threshold that will be guaranteed to work for you. It +will depend on the size of your cluster, number of jobs, churn of tasks in the cluster, etc. We +recommend you start with a strict value after viewing a small amount of collected data, and then +adjust thresholds as you see fit. Feel free to ask us if you would like to validate that your alerts +and thresholds make sense.</p> + +<h2 id="important-stats">Important stats</h2> + +<h3 id="jvm_uptime_secs"><code>jvm_uptime_secs</code></h3> + +<p>Type: integer counter</p> + +<p>The number of seconds the JVM process has been running. Comes from +<a href="http://docs.oracle.com/javase/7/docs/api/java/lang/management/RuntimeMXBean.html#getUptime()">RuntimeMXBean#getUptime()</a></p> + +<p>Detecting resets (decreasing values) on this stat will tell you that the scheduler is failing to +stay alive.</p> + +<p>Look at the scheduler logs to identify the reason the scheduler is exiting.</p> + +<h3 id="system_load_avg"><code>system_load_avg</code></h3> + +<p>Type: double gauge</p> + +<p>The current load average of the system for the last minute. Comes from +<a href="http://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSystemMXBean.html?is-external=true#getSystemLoadAverage()">OperatingSystemMXBean#getSystemLoadAverage()</a>.</p> + +<p>A high sustained value suggests that the scheduler machine may be over-utilized.</p> + +<p>Use standard unix tools like <code>top</code> and <code>ps</code> to track down the offending process(es).</p> + +<h3 id="process_cpu_cores_utilized"><code>process_cpu_cores_utilized</code></h3> + +<p>Type: double gauge</p> + +<p>The current number of CPU cores in use by the JVM process. This should not exceed the number of +logical CPU cores on the machine. Derived from +<a href="http://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html">OperatingSystemMXBean#getProcessCpuTime()</a></p> + +<p>A high sustained value indicates that the scheduler is overworked. Due to current internal design +limitations, if this value is sustained at <code>1</code>, there is a good chance the scheduler is under water.</p> + +<p>There are two main inputs that tend to drive this figure: task scheduling attempts and status +updates from Mesos. You may see activity in the scheduler logs to give an indication of where +time is being spent. Beyond that, it really takes good familiarity with the code to effectively +triage this. We suggest engaging with an Aurora developer.</p> + +<h3 id="task_store_lost"><code>task_store_LOST</code></h3> + +<p>Type: integer gauge</p> + +<p>The number of tasks stored in the scheduler that are in the <code>LOST</code> state, and have been rescheduled.</p> + +<p>If this value is increasing at a high rate, it is a sign of trouble.</p> + +<p>There are many sources of <code>LOST</code> tasks in Mesos: the scheduler, master, agent, and executor can all +trigger this. The first step is to look in the scheduler logs for <code>LOST</code> to identify where the +state changes are originating.</p> + +<h3 id="scheduler_resource_offers"><code>scheduler_resource_offers</code></h3> + +<p>Type: integer counter</p> + +<p>The number of resource offers that the scheduler has received.</p> + +<p>For a healthy scheduler, this value must be increasing over time.</p> + +<p>Assuming the scheduler is up and otherwise healthy, you will want to check if the master thinks it +is sending offers. You should also look at the master’s web interface to see if it has a large +number of outstanding offers that it is waiting to be returned.</p> + +<h3 id="framework_registered"><code>framework_registered</code></h3> + +<p>Type: binary integer counter</p> + +<p>Will be <code>1</code> for the leading scheduler that is registered with the Mesos master, <code>0</code> for passive +schedulers,</p> + +<p>A sustained period without a <code>1</code> (or where <code>sum() != 1</code>) warrants investigation.</p> + +<p>If there is no leading scheduler, look in the scheduler and master logs for why. If there are +multiple schedulers claiming leadership, this suggests a split brain and warrants filing a critical +bug.</p> + +<h3 id="rate-scheduler_log_native_append_nanos_total-rate-scheduler_log_native_append_events"><code>rate(scheduler_log_native_append_nanos_total)/rate(scheduler_log_native_append_events)</code></h3> + +<p>Type: rate ratio of integer counters</p> + +<p>This composes two counters to compute a windowed figure for the latency of replicated log writes.</p> + +<p>A hike in this value suggests disk bandwidth contention.</p> + +<p>Look in scheduler logs for any reported oddness with saving to the replicated log. Also use +standard tools like <code>vmstat</code> and <code>iotop</code> to identify whether the disk has become slow or +over-utilized. We suggest using a dedicated disk for the replicated log to mitigate this.</p> + +<h3 id="timed_out_tasks"><code>timed_out_tasks</code></h3> + +<p>Type: integer counter</p> + +<p>Tracks the number of times the scheduler has given up while waiting +(for <code>-transient_task_state_timeout</code>) to hear back about a task that is in a transient state +(e.g. <code>ASSIGNED</code>, <code>KILLING</code>), and has moved to <code>LOST</code> before rescheduling.</p> + +<p>This value is currently known to increase occasionally when the scheduler fails over +(<a href="https://issues.apache.org/jira/browse/AURORA-740">AURORA-740</a>). However, any large spike in this +value warrants investigation.</p> + +<p>The scheduler will log when it times out a task. You should trace the task ID of the timed out +task into the master, agent, and/or executors to determine where the message was dropped.</p> + +<h3 id="http_500_responses_events"><code>http_500_responses_events</code></h3> + +<p>Type: integer counter</p> + +<p>The total number of HTTP 500 status responses sent by the scheduler. Includes API and asset serving.</p> + +<p>An increase warrants investigation.</p> + +<p>Look in scheduler logs to identify why the scheduler returned a 500, there should be a stack trace.</p> + +</div> + + </div> + </div> + <div class="container-fluid section-footer buffer"> + <div class="container"> + <div class="row"> + <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3> + <ul> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/community/">Mailing Lists</a></li> + <li><a href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li> + <li><a href="/documentation/latest/contributing/">How To Contribute</a></li> + </ul> + </div> + <div class="col-md-2"><h3>The ASF</h3> + <ul> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-md-6"> + <p class="disclaimer">© 2014-2017 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX photo</a> displayed on the homepage is available under a <a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + + </body> +</html> Added: aurora/site/publish/documentation/0.21.0/operations/security/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.21.0/operations/security/index.html?rev=1840515&view=auto ============================================================================== --- aurora/site/publish/documentation/0.21.0/operations/security/index.html (added) +++ aurora/site/publish/documentation/0.21.0/operations/security/index.html Tue Sep 11 05:28:10 2018 @@ -0,0 +1,509 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <title>Apache Aurora</title> + <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css"> + <link href="/assets/css/main.css" rel="stylesheet"> + <!-- Analytics --> + <script type="text/javascript"> + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-45879646-1']); + _gaq.push(['_setDomainName', 'apache.org']); + _gaq.push(['_trackPageview']); + + (function() { + var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; + ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; + var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + </script> + </head> + <body> + <div class="container-fluid section-header"> + <div class="container"> + <div class="nav nav-bar"> + <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300" alt="Transparent Apache Aurora logo with dark background"/></a> + <ul class="nav navbar-nav navbar-right"> + <li><a href="/documentation/latest/">Documentation</a></li> + <li><a href="/community/">Community</a></li> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/blog/">Blog</a></li> + </ul> + </div> + </div> +</div> + + <div class="container-fluid"> + <div class="container content"> + <div class="col-md-12 documentation"> +<h5 class="page-header text-uppercase">Documentation +<select onChange="window.location.href='/documentation/' + this.value + '/operations/security/'" + value="0.21.0"> + <option value="0.21.0" + selected="selected"> + 0.21.0 + (latest) + </option> + <option value="0.20.0" + > + 0.20.0 + </option> + <option value="0.19.1" + > + 0.19.1 + </option> + <option value="0.19.0" + > + 0.19.0 + </option> + <option value="0.18.1" + > + 0.18.1 + </option> + <option value="0.18.0" + > + 0.18.0 + </option> + <option value="0.17.0" + > + 0.17.0 + </option> + <option value="0.16.0" + > + 0.16.0 + </option> + <option value="0.15.0" + > + 0.15.0 + </option> + <option value="0.14.0" + > + 0.14.0 + </option> + <option value="0.13.0" + > + 0.13.0 + </option> + <option value="0.12.0" + > + 0.12.0 + </option> + <option value="0.11.0" + > + 0.11.0 + </option> + <option value="0.10.0" + > + 0.10.0 + </option> + <option value="0.9.0" + > + 0.9.0 + </option> + <option value="0.8.0" + > + 0.8.0 + </option> + <option value="0.7.0-incubating" + > + 0.7.0-incubating + </option> + <option value="0.6.0-incubating" + > + 0.6.0-incubating + </option> + <option value="0.5.0-incubating" + > + 0.5.0-incubating + </option> +</select> +</h5> +<h1 id="securing-your-aurora-cluster">Securing your Aurora Cluster</h1> + +<p>Aurora integrates with <a href="http://shiro.apache.org/">Apache Shiro</a> to provide security +controls for its API. In addition to providing some useful features out of the box, Shiro +also allows Aurora cluster administrators to adapt the security system to their organizationâs +existing infrastructure. The announcer in the Aurora thermos executor also supports security +controls for talking to ZooKeeper.</p> + +<ul> +<li><a href="#enabling-security">Enabling Security</a></li> +<li><a href="#authentication">Authentication</a> + +<ul> +<li><a href="#http-basic-authentication">HTTP Basic Authentication</a> + +<ul> +<li><a href="#server-configuration">Server Configuration</a></li> +<li><a href="#client-configuration">Client Configuration</a></li> +</ul></li> +<li><a href="#http-spnego-authentication-kerberos">HTTP SPNEGO Authentication (Kerberos)</a> + +<ul> +<li><a href="#server-configuration-1">Server Configuration</a></li> +<li><a href="#client-configuration-1">Client Configuration</a></li> +</ul></li> +</ul></li> +<li><a href="#authorization">Authorization</a> + +<ul> +<li><a href="#using-an-ini-file-to-define-security-controls">Using an INI file to define security controls</a> + +<ul> +<li><a href="#caveats">Caveats</a></li> +</ul></li> +</ul></li> +<li><a href="#implementing-a-custom-realm">Implementing a Custom Realm</a> + +<ul> +<li><a href="#packaging-a-realm-module">Packaging a realm module</a></li> +</ul></li> +<li><a href="#announcer-authentication">Announcer Authentication</a> + +<ul> +<li><a href="#zookeeper-authentication-configuration">ZooKeeper authentication configuration</a></li> +<li><a href="#executor-settings">Executor settings</a></li> +</ul></li> +<li><a href="#scheduler-https">Scheduler HTTPS</a></li> +<li><a href="#known-issues">Known Issues</a></li> +</ul> + +<h1 id="enabling-security">Enabling Security</h1> + +<p>There are two major components of security: +<a href="http://en.wikipedia.org/wiki/Authentication#Authorization">authentication and authorization</a>. A +cluster administrator may choose the approach used for each, and may also implement custom +mechanisms for either. Later sections describe the options available. To enable authentication + for the announcer, see <a href="#announcer-authentication">Announcer Authentication</a></p> + +<h1 id="authentication">Authentication</h1> + +<p>The scheduler must be configured with instructions for how to process authentication +credentials at a minimum. There are currently two built-in authentication schemes - +<a href="http://en.wikipedia.org/wiki/Basic_access_authentication">HTTP Basic Authentication</a>, and +<a href="http://en.wikipedia.org/wiki/SPNEGO">SPNEGO</a> (Kerberos).</p> + +<h2 id="http-basic-authentication">HTTP Basic Authentication</h2> + +<p>Basic Authentication is a very quick way to add <em>some</em> security. It is supported +by all major browsers and HTTP client libraries with minimal work. However, +before relying on Basic Authentication you should be aware of the <a href="http://tools.ietf.org/html/rfc2617#section-4">security +considerations</a>.</p> + +<h3 id="server-configuration">Server Configuration</h3> + +<p>At a minimum you need to set 4 command-line flags on the scheduler:</p> +<pre class="highlight plaintext"><code>-http_authentication_mechanism=BASIC +-shiro_realm_modules=INI_AUTHNZ +-shiro_ini_path=path/to/security.ini +</code></pre> + +<p>And create a security.ini file like so:</p> +<pre class="highlight plaintext"><code>[users] +sally = apple, admin + +[roles] +admin = * +</code></pre> + +<p>The details of the security.ini file are explained below. Note that this file contains plaintext, +unhashed passwords.</p> + +<h3 id="client-configuration">Client Configuration</h3> + +<p>To configure the client for HTTP Basic authentication, add an entry to ~/.netrc with your credentials</p> +<pre class="highlight plaintext"><code>% cat ~/.netrc +# ... + +machine aurora.example.com +login sally +password apple + +# ... +</code></pre> + +<p>No changes are required to <code>clusters.json</code>.</p> + +<h2 id="http-spnego-authentication-kerberos">HTTP SPNEGO Authentication (Kerberos)</h2> + +<h3 id="server-configuration">Server Configuration</h3> + +<p>At a minimum you need to set 6 command-line flags on the scheduler:</p> +<pre class="highlight plaintext"><code>-http_authentication_mechanism=NEGOTIATE +-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ +-kerberos_server_principal=HTTP/[email protected] +-kerberos_server_keytab=path/to/aurora.example.com.keytab +-shiro_ini_path=path/to/security.ini +</code></pre> + +<p>And create a security.ini file like so:</p> +<pre class="highlight plaintext"><code>% cat path/to/security.ini +[users] +sally = _, admin + +[roles] +admin = * +</code></pre> + +<p>What’s going on here? First, Aurora must be configured to request Kerberos credentials when presented with an +unauthenticated request. This is achieved by setting</p> +<pre class="highlight plaintext"><code>-http_authentication_mechanism=NEGOTIATE +</code></pre> + +<p>Next, a Realm module must be configured to <strong>authenticate</strong> the current request using the Kerberos +credentials that were requested. Aurora ships with a realm module that can do this</p> +<pre class="highlight plaintext"><code>-shiro_realm_modules=KERBEROS5_AUTHN[,...] +</code></pre> + +<p>The Kerberos5Realm requires a keytab file and a server principal name. The principal name will usually +be in the form <code>HTTP/[email protected]</code>.</p> +<pre class="highlight plaintext"><code>-kerberos_server_principal=HTTP/[email protected] +-kerberos_server_keytab=path/to/aurora.example.com.keytab +</code></pre> + +<p>The Kerberos5 realm module is authentication-only. For scheduler security to work you must also +enable a realm module that provides an Authorizer implementation. For example, to do this using the +IniShiroRealmModule:</p> +<pre class="highlight plaintext"><code>-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ +</code></pre> + +<p>You can then configure authorization using a security.ini file as described below +(the password field is ignored). You must configure the realm module with the path to this file:</p> +<pre class="highlight plaintext"><code>-shiro_ini_path=path/to/security.ini +</code></pre> + +<h3 id="client-configuration">Client Configuration</h3> + +<p>To use Kerberos on the client-side you must build Kerberos-enabled client binaries. Do this with</p> +<pre class="highlight plaintext"><code>./pants binary src/main/python/apache/aurora/kerberos:kaurora +./pants binary src/main/python/apache/aurora/kerberos:kaurora_admin +</code></pre> + +<p>You must also configure each cluster where you’ve enabled Kerberos on the scheduler +to use Kerberos authentication. Do this by setting <code>auth_mechanism</code> to <code>KERBEROS</code> +in <code>clusters.json</code>.</p> +<pre class="highlight plaintext"><code>% cat ~/.aurora/clusters.json +{ + "devcluser": { + "auth_mechanism": "KERBEROS", + ... + }, + ... +} +</code></pre> + +<h1 id="authorization">Authorization</h1> + +<p>Given a means to authenticate the entity a client claims they are, we need to define what privileges they have.</p> + +<h2 id="using-an-ini-file-to-define-security-controls">Using an INI file to define security controls</h2> + +<p>The simplest security configuration for Aurora is an INI file on the scheduler. For small +clusters, or clusters where the users and access controls change relatively infrequently, this is +likely the preferred approach. However you may want to avoid this approach if access permissions +are rapidly changing, or if your access control information already exists in another system.</p> + +<p>You can enable INI-based configuration with following scheduler command line arguments:</p> +<pre class="highlight plaintext"><code>-http_authentication_mechanism=BASIC +-shiro_ini_path=path/to/security.ini +</code></pre> + +<p><em>note</em> As the argument name reveals, this is using Shiroâs +<a href="http://shiro.apache.org/configuration.html#Configuration-INIConfiguration">IniRealm</a> behind +the scenes.</p> + +<p>The INI file will contain two sections - users and roles. Hereâs an example for what might +be in security.ini:</p> +<pre class="highlight plaintext"><code>[users] +sally = apple, admin +jim = 123456, accounting +becky = letmein, webapp +larry = 654321,accounting +steve = password + +[roles] +admin = * +accounting = thrift.AuroraAdmin:setQuota +webapp = thrift.AuroraSchedulerManager:*:webapp +</code></pre> + +<p>The users section defines user user credentials and the role(s) they are members of. These lines +are of the format <code><user> = <password>[, <role>...]</code>. As you probably noticed, the passwords are +in plaintext and as a result read access to this file should be restricted.</p> + +<p>In this configuration, each user has different privileges for actions in the cluster because +of the roles they are a part of:</p> + +<ul> +<li>admin is granted all privileges</li> +<li>accounting may adjust the amount of resource quota for any role</li> +<li>webapp represents a collection of jobs that represents a service, and its members may create and modify any jobs owned by it</li> +</ul> + +<h3 id="caveats">Caveats</h3> + +<p>You might find documentation on the Internet suggesting there are additional sections in <code>shiro.ini</code>, +like <code>[main]</code> and <code>[urls]</code>. These are not supported by Aurora as it uses a different mechanism to configure +those parts of Shiro. Think of Aurora’s <code>security.ini</code> as a subset with only <code>[users]</code> and <code>[roles]</code> sections.</p> + +<h2 id="implementing-delegated-authorization">Implementing Delegated Authorization</h2> + +<p>It is possible to leverage Shiro’s <code>runAs</code> feature by implementing a custom Servlet Filter that provides +the capability and passing it’s fully qualified class name to the command line argument +<code>-shiro_after_auth_filter</code>. The filter is registered in the same filter chain as the Shiro auth filters +and is placed after the Shiro auth filters in the filter chain. This ensures that the Filter is invoked +after the Shiro filters have had a chance to authenticate the request.</p> + +<h1 id="implementing-a-custom-realm">Implementing a Custom Realm</h1> + +<p>Since Auroraâs security is backed by <a href="https://shiro.apache.org">Apache Shiro</a>, you can implement a +custom <a href="http://shiro.apache.org/realm.html">Realm</a> to define organization-specific security behavior.</p> + +<p>In addition to using Shiro’s standard APIs to implement a Realm you can link against Aurora to +access the type-safe Permissions Aurora uses. See the Javadoc for <code>org.apache.aurora.scheduler.spi</code> +for more information.</p> + +<h2 id="packaging-a-realm-module">Packaging a realm module</h2> + +<p>Package your custom Realm(s) with a Guice module that exposes a <code>Set<Realm></code> multibinding.</p> +<pre class="highlight java"><code><span style="color: #000000;font-weight: bold">package</span> <span style="background-color: #f8f8f8">com</span><span style="color: #000000;font-weight: bold">.</span><span style="color: #008080">example</span><span style="color: #000000;font-weight: bold">;</span> + +<span style="color: #000000;font-weight: bold">import</span> <span style="color: #555555">com.google.inject.AbstractModule</span><span style="color: #000000;font-weight: bold">;</span> +<span style="color: #000000;font-weight: bold">import</span> <span style="color: #555555">com.google.inject.multibindings.Multibinder</span><span style="color: #000000;font-weight: bold">;</span> +<span style="color: #000000;font-weight: bold">import</span> <span style="color: #555555">org.apache.shiro.realm.Realm</span><span style="color: #000000;font-weight: bold">;</span> + +<span style="color: #000000;font-weight: bold">public</span> <span style="color: #000000;font-weight: bold">class</span> <span style="color: #445588;font-weight: bold">MyRealmModule</span> <span style="color: #000000;font-weight: bold">extends</span> <span style="background-color: #f8f8f8">AbstractModule</span> <span style="color: #000000;font-weight: bold">{</span> + <span style="color: #3c5d5d;font-weight: bold">@Override</span> + <span style="color: #000000;font-weight: bold">public</span> <span style="color: #445588;font-weight: bold">void</span> <span style="background-color: #f8f8f8">configure</span><span style="color: #000000;font-weight: bold">()</span> <span style="color: #000000;font-weight: bold">{</span> + <span style="background-color: #f8f8f8">Realm</span> <span style="background-color: #f8f8f8">myRealm</span> <span style="color: #000000;font-weight: bold">=</span> <span style="color: #000000;font-weight: bold">new</span> <span style="background-color: #f8f8f8">MyRealm</span><span style="color: #000000;font-weight: bold">();</span> + + <span style="background-color: #f8f8f8">Multibinder</span><span style="color: #000000;font-weight: bold">.</span><span style="color: #008080">newSetBinder</span><span style="color: #000000;font-weight: bold">(</span><span style="background-color: #f8f8f8">binder</span><span style="color: #000000;font-weight: bold">(),</span> <span style="background-color: #f8f8f8">Realm</span><span style="color: #000000;font-weight: bold">.</span><span style="color: #008080">class</span><span style="color: #000000;font-weight: bold">).</span><span style="color: #008080">addBinding</span><span style="color: #000000;font-weight: bold">().</span><span style="color: #008080">toInstance</span><span style="color: #000000;font-weight: bold">(</span><span style="background-color: #f8f8f8">myRealm</span><span style="color: #000000;font-weight: bold">);</span> + <span style="color: #000000;font-weight: bold">}</span> + + <span style="color: #000000;font-weight: bold">static</span> <span style="color: #000000;font-weight: bold">class</span> <span style="color: #445588;font-weight: bold">MyRealm</span> <span style="color: #000000;font-weight: bold">implements</span> <span style="background-color: #f8f8f8">Realm</span> <span style="color: #000000;font-weight: bold">{</span> + <span style="color: #999988;font-style: italic">// Realm implementation.</span> + <span style="color: #000000;font-weight: bold">}</span> +<span style="color: #000000;font-weight: bold">}</span> +</code></pre> + +<p>To use your module in the scheduler, include it as a realm module based on its fully-qualified +class name:</p> +<pre class="highlight plaintext"><code>-shiro_realm_modules=KERBEROS5_AUTHN,INI_AUTHNZ,com.example.MyRealmModule +</code></pre> + +<h1 id="announcer-authentication">Announcer Authentication</h1> + +<p>The Thermos executor can be configured to authenticate with ZooKeeper and include +an <a href="https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#sc_ZooKeeperAccessControl">ACL</a> +on the nodes it creates, which will specify +the privileges of clients to perform different actions on these nodes. This +feature is enabled by specifying an ACL configuration file to the executor with the +<code>--announcer-zookeeper-auth-config</code> command line argument.</p> + +<p>When this feature is <em>not</em> enabled, nodes created by the executor will have ‘world/all’ permission +(<code>ZOO_OPEN_ACL_UNSAFE</code>). In most production environments, operators should specify an ACL and +limit access.</p> + +<h2 id="zookeeper-authentication-configuration">ZooKeeper Authentication Configuration</h2> + +<p>The configuration file must be formatted as JSON with the following schema:</p> +<pre class="highlight json"><code><span style="background-color: #f8f8f8">{</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"auth"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="background-color: #f8f8f8">[</span><span style="color: #bbbbbb"> + </span><span style="background-color: #f8f8f8">{</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"scheme"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #d14">"<scheme>"</span><span style="background-color: #f8f8f8">,</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"credential"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #d14">"<plain_credential>"</span><span style="color: #bbbbbb"> + </span><span style="background-color: #f8f8f8">}</span><span style="color: #bbbbbb"> + </span><span style="background-color: #f8f8f8">],</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"acl"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="background-color: #f8f8f8">[</span><span style="color: #bbbbbb"> + </span><span style="background-color: #f8f8f8">{</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"scheme"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #d14">"<scheme>"</span><span style="background-color: #f8f8f8">,</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"credential"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #d14">"<plain_credential>"</span><span style="background-color: #f8f8f8">,</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"permissions"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="background-color: #f8f8f8">{</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"read"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #a61717;background-color: #e3d2d2"><bool></span><span style="background-color: #f8f8f8">,</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"write"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #a61717;background-color: #e3d2d2"><bool></span><span style="background-color: #f8f8f8">,</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"create"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #a61717;background-color: #e3d2d2"><bool></span><span style="background-color: #f8f8f8">,</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"delete"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #a61717;background-color: #e3d2d2"><bool></span><span style="background-color: #f8f8f8">,</span><span style="color: #bbbbbb"> + </span><span style="color: #000080">"admin"</span><span style="background-color: #f8f8f8">:</span><span style="color: #bbbbbb"> </span><span style="color: #a61717;background-color: #e3d2d2"><bool></span><span style="color: #bbbbbb"> + </span><span style="background-color: #f8f8f8">}</span><span style="color: #bbbbbb"> + </span><span style="background-color: #f8f8f8">}</span><span style="color: #bbbbbb"> + </span><span style="background-color: #f8f8f8">]</span><span style="color: #bbbbbb"> +</span><span style="background-color: #f8f8f8">}</span><span style="color: #bbbbbb"> +</span></code></pre> + +<p>The <code>scheme</code> +defines the encoding of the credential field. Note that these fields are passed directly to +ZooKeeper (except in the case of <em>digest</em> scheme, where the executor will hash and encode +the credential appropriately before passing it to ZooKeeper). In addition to <code>acl</code>, a list of +authentication credentials must be provided in <code>auth</code> to use for the connection.</p> + +<p>All properties of the <code>permissions</code> object will default to False if not provided.</p> + +<h2 id="executor-settings">Executor settings</h2> + +<p>To enable the executor to authenticate against ZK, <code>--announcer-zookeeper-auth-config</code> should be +set to the configuration file.</p> + +<h1 id="scheduler-https">Scheduler HTTPS</h1> + +<p>The Aurora scheduler does not provide native HTTPS support (<a href="https://issues.apache.org/jira/browse/AURORA-343">AURORA-343</a>). +It is therefore recommended to deploy it behind an HTTPS capable reverse proxy such as nginx or Apache2.</p> + +<p>A simple setup is to launch both the reverse proxy and the Aurora scheduler on the same port, but +bind the reverse proxy to the public IP of the host and the scheduler to localhost:</p> +<pre class="highlight plaintext"><code>-ip=127.0.0.1 +-http_port=8081 +</code></pre> + +<p>If your clients connect to the scheduler via <a href="../../reference/scheduler-configuration/"><code>proxy_url</code></a>, +you can update it to <code>https</code>. If you use the ZooKeeper based discovery instead, the scheduler +needs to be launched via</p> +<pre class="highlight plaintext"><code>-serverset_endpoint_name=https +</code></pre> + +<p>in order to announce its HTTPS support within ZooKeeper.</p> + +<h1 id="known-issues">Known Issues</h1> + +<p>While the APIs and SPIs we ship with are stable as of 0.8.0, we are aware of several incremental +improvements. Please follow, vote, or send patches.</p> + +<p>Relevant tickets: +* <a href="https://issues.apache.org/jira/browse/AURORA-1248">AURORA-1248</a>: Client retries 4xx errors +* <a href="https://issues.apache.org/jira/browse/AURORA-1279">AURORA-1279</a>: Remove kerberos-specific build targets +* <a href="https://issues.apache.org/jira/browse/AURORA-1291">AURORA-1293</a>: Consider defining a JSON format in place of INI +* <a href="https://issues.apache.org/jira/browse/AURORA-1179">AURORA-1179</a>: Supported hashed passwords in security.ini +* <a href="https://issues.apache.org/jira/browse/AURORA-1295">AURORA-1295</a>: Support security for the ReadOnlyScheduler service</p> + +</div> + + </div> + </div> + <div class="container-fluid section-footer buffer"> + <div class="container"> + <div class="row"> + <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3> + <ul> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/community/">Mailing Lists</a></li> + <li><a href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li> + <li><a href="/documentation/latest/contributing/">How To Contribute</a></li> + </ul> + </div> + <div class="col-md-2"><h3>The ASF</h3> + <ul> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-md-6"> + <p class="disclaimer">© 2014-2017 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX photo</a> displayed on the homepage is available under a <a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + + </body> +</html>
