Added: aurora/site/publish/documentation/0.19.0/reference/task-lifecycle/index.html URL: http://svn.apache.org/viewvc/aurora/site/publish/documentation/0.19.0/reference/task-lifecycle/index.html?rev=1814961&view=auto ============================================================================== --- aurora/site/publish/documentation/0.19.0/reference/task-lifecycle/index.html (added) +++ aurora/site/publish/documentation/0.19.0/reference/task-lifecycle/index.html Sat Nov 11 16:49:46 2017 @@ -0,0 +1,292 @@ +<!DOCTYPE html> +<html lang="en"> + <head> + <meta charset="utf-8"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <title>Apache Aurora</title> + <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css"> + <link href="/assets/css/main.css" rel="stylesheet"> + <!-- Analytics --> + <script type="text/javascript"> + var _gaq = _gaq || []; + _gaq.push(['_setAccount', 'UA-45879646-1']); + _gaq.push(['_setDomainName', 'apache.org']); + _gaq.push(['_trackPageview']); + + (function() { + var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; + ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; + var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); + })(); + </script> + </head> + <body> + <div class="container-fluid section-header"> + <div class="container"> + <div class="nav nav-bar"> + <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300" alt="Transparent Apache Aurora logo with dark background"/></a> + <ul class="nav navbar-nav navbar-right"> + <li><a href="/documentation/latest/">Documentation</a></li> + <li><a href="/community/">Community</a></li> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/blog/">Blog</a></li> + </ul> + </div> + </div> +</div> + + <div class="container-fluid"> + <div class="container content"> + <div class="col-md-12 documentation"> +<h5 class="page-header text-uppercase">Documentation +<select onChange="window.location.href='/documentation/' + this.value + '/reference/task-lifecycle/'" + value="0.19.0"> + <option value="0.19.0" + selected="selected"> + 0.19.0 + (latest) + </option> + <option value="0.18.1" + > + 0.18.1 + </option> + <option value="0.18.0" + > + 0.18.0 + </option> + <option value="0.17.0" + > + 0.17.0 + </option> + <option value="0.16.0" + > + 0.16.0 + </option> + <option value="0.15.0" + > + 0.15.0 + </option> + <option value="0.14.0" + > + 0.14.0 + </option> + <option value="0.13.0" + > + 0.13.0 + </option> + <option value="0.12.0" + > + 0.12.0 + </option> + <option value="0.11.0" + > + 0.11.0 + </option> + <option value="0.10.0" + > + 0.10.0 + </option> + <option value="0.9.0" + > + 0.9.0 + </option> + <option value="0.8.0" + > + 0.8.0 + </option> + <option value="0.7.0-incubating" + > + 0.7.0-incubating + </option> + <option value="0.6.0-incubating" + > + 0.6.0-incubating + </option> + <option value="0.5.0-incubating" + > + 0.5.0-incubating + </option> +</select> +</h5> +<h1 id="task-lifecycle">Task Lifecycle</h1> + +<p>When Aurora reads a configuration file and finds a <code>Job</code> definition, it:</p> + +<ol> +<li> Evaluates the <code>Job</code> definition.</li> +<li> Splits the <code>Job</code> into its constituent <code>Task</code>s.</li> +<li> Sends those <code>Task</code>s to the scheduler.</li> +<li> The scheduler puts the <code>Task</code>s into <code>PENDING</code> state, starting each +<code>Task</code>’s life cycle.</li> +</ol> + +<p><img alt="Life of a task" src="../../images/lifeofatask.png" /></p> + +<p>Please note, a couple of task states described below are missing from +this state diagram.</p> + +<h2 id="pending-to-running-states">PENDING to RUNNING states</h2> + +<p>When a <code>Task</code> is in the <code>PENDING</code> state, the scheduler constantly +searches for machines satisfying that <code>Task</code>’s resource request +requirements (RAM, disk space, CPU time) while maintaining configuration +constraints such as “a <code>Task</code> must run on machines dedicated to a +particular role” or attribute limit constraints such as “at most 2 +<code>Task</code>s from the same <code>Job</code> may run on each rack”. When the scheduler +finds a suitable match, it assigns the <code>Task</code> to a machine and puts the +<code>Task</code> into the <code>ASSIGNED</code> state.</p> + +<p>From the <code>ASSIGNED</code> state, the scheduler sends an RPC to the agent +machine containing <code>Task</code> configuration, which the agent uses to spawn +an executor responsible for the <code>Task</code>’s lifecycle. When the scheduler +receives an acknowledgment that the machine has accepted the <code>Task</code>, +the <code>Task</code> goes into <code>STARTING</code> state.</p> + +<p><code>STARTING</code> state initializes a <code>Task</code> sandbox. When the sandbox is fully +initialized, Thermos begins to invoke <code>Process</code>es. Also, the agent +machine sends an update to the scheduler that the <code>Task</code> is +in <code>RUNNING</code> state, only after the task satisfies the liveness requirements. +See <a href="../features/services#health-checking">Health Checking</a> for more details +for how to configure health checks.</p> + +<h2 id="running-to-terminal-states">RUNNING to terminal states</h2> + +<p>There are various ways that an active <code>Task</code> can transition into a terminal +state. By definition, it can never leave this state. However, depending on +nature of the termination and the originating <code>Job</code> definition +(e.g. <code>service</code>, <code>max_task_failures</code>), a replacement <code>Task</code> might be +scheduled.</p> + +<h3 id="natural-termination-finished-failed">Natural Termination: FINISHED, FAILED</h3> + +<p>A <code>RUNNING</code> <code>Task</code> can terminate without direct user interaction. For +example, it may be a finite computation that finishes, even something as +simple as <code>echo hello world.</code>, or it could be an exceptional condition in +a long-lived service. If the <code>Task</code> is successful (its underlying +processes have succeeded with exit status <code>0</code> or finished without +reaching failure limits) it moves into <code>FINISHED</code> state. If it finished +after reaching a set of failure limits, it goes into <code>FAILED</code> state.</p> + +<p>A terminated <code>TASK</code> which is subject to rescheduling will be temporarily +<code>THROTTLED</code>, if it is considered to be flapping. A task is flapping, if its +previous invocation was terminated after less than 5 minutes (scheduler +default). The time penalty a task has to remain in the <code>THROTTLED</code> state, +before it is eligible for rescheduling, increases with each consecutive +failure.</p> + +<h3 id="forceful-termination-killing-restarting">Forceful Termination: KILLING, RESTARTING</h3> + +<p>You can terminate a <code>Task</code> by issuing an <code>aurora job kill</code> command, which +moves it into <code>KILLING</code> state. The scheduler then sends the agent a +request to terminate the <code>Task</code>. If the scheduler receives a successful +response, it moves the Task into <code>KILLED</code> state and never restarts it.</p> + +<p>If a <code>Task</code> is forced into the <code>RESTARTING</code> state via the <code>aurora job restart</code> +command, the scheduler kills the underlying task but in parallel schedules +an identical replacement for it.</p> + +<p>In any case, the responsible executor on the agent follows an escalation +sequence when killing a running task:</p> + +<ol> +<li>If a <code>HttpLifecycleConfig</code> is not present, skip to (4).</li> +<li>Send a POST to the <code>graceful_shutdown_endpoint</code> and wait +<code>graceful_shutdown_wait_secs</code> seconds.</li> +<li>Send a POST to the <code>shutdown_endpoint</code> and wait +<code>shutdown_wait_secs</code> seconds.</li> +<li>Send SIGTERM (<code>kill</code>) and wait at most <code>finalization_wait</code> seconds.</li> +<li>Send SIGKILL (<code>kill -9</code>).</li> +</ol> + +<p>If the executor notices that all <code>Process</code>es in a <code>Task</code> have aborted +during this sequence, it will not proceed with subsequent steps. +Note that graceful shutdown is best-effort, and due to the many +inevitable realities of distributed systems, it may not be performed.</p> + +<h3 id="unexpected-termination-lost">Unexpected Termination: LOST</h3> + +<p>If a <code>Task</code> stays in a transient task state for too long (such as <code>ASSIGNED</code> +or <code>STARTING</code>), the scheduler forces it into <code>LOST</code> state, creating a new +<code>Task</code> in its place that’s sent into <code>PENDING</code> state.</p> + +<p>In addition, if the Mesos core tells the scheduler that a agent has +become unhealthy (or outright disappeared), the <code>Task</code>s assigned to that +agent go into <code>LOST</code> state and new <code>Task</code>s are created in their place. +From <code>PENDING</code> state, there is no guarantee a <code>Task</code> will be reassigned +to the same machine unless job constraints explicitly force it there.</p> + +<h3 id="giving-priority-to-production-tasks-preempting">Giving Priority to Production Tasks: PREEMPTING</h3> + +<p>Sometimes a Task needs to be interrupted, such as when a non-production +Task’s resources are needed by a higher priority production Task. This +type of interruption is called a <em>pre-emption</em>. When this happens in +Aurora, the non-production Task is killed and moved into +the <code>PREEMPTING</code> state when both the following are true:</p> + +<ul> +<li>The task being killed is a non-production task.</li> +<li>The other task is a <code>PENDING</code> production task that hasn’t been +scheduled due to a lack of resources.</li> +</ul> + +<p>The scheduler UI shows the non-production task was preempted in favor of +the production task. At some point, tasks in <code>PREEMPTING</code> move to <code>KILLED</code>.</p> + +<p>Note that non-production tasks consuming many resources are likely to be +preempted in favor of production tasks.</p> + +<h3 id="making-room-for-maintenance-draining">Making Room for Maintenance: DRAINING</h3> + +<p>Cluster operators can set agent into maintenance mode. This will transition +all <code>Task</code> running on this agent into <code>DRAINING</code> and eventually to <code>KILLED</code>. +Drained <code>Task</code>s will be restarted on other agents for which no maintenance +has been announced yet.</p> + +<h2 id="state-reconciliation">State Reconciliation</h2> + +<p>Due to the many inevitable realities of distributed systems, there might +be a mismatch of perceived and actual cluster state (e.g. a machine returns +from a <code>netsplit</code> but the scheduler has already marked all its <code>Task</code>s as +<code>LOST</code> and rescheduled them).</p> + +<p>Aurora regularly runs a state reconciliation process in order to detect +and correct such issues (e.g. by killing the errant <code>RUNNING</code> tasks). +By default, the proper detection of all failure scenarios and inconsistencies +may take up to an hour.</p> + +<p>To emphasize this point: there is no uniqueness guarantee for a single +instance of a job in the presence of network partitions. If the <code>Task</code> +requires that, it should be baked in at the application level using a +distributed coordination service such as Zookeeper.</p> + +</div> + + </div> + </div> + <div class="container-fluid section-footer buffer"> + <div class="container"> + <div class="row"> + <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3> + <ul> + <li><a href="/downloads/">Downloads</a></li> + <li><a href="/community/">Mailing Lists</a></li> + <li><a href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li> + <li><a href="/documentation/latest/contributing/">How To Contribute</a></li> + </ul> + </div> + <div class="col-md-2"><h3>The ASF</h3> + <ul> + <li><a href="http://www.apache.org/licenses/">License</a></li> + <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> + <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> + <li><a href="http://www.apache.org/security/">Security</a></li> + </ul> + </div> + <div class="col-md-6"> + <p class="disclaimer">© 2014-2017 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX photo</a> displayed on the homepage is available under a <a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo are trademarks of The Apache Software Foundation.</p> + </div> + </div> + </div> + + </body> +</html>
Added: aurora/site/source/blog/2017-11-10-aurora-0-19-0-released.md URL: http://svn.apache.org/viewvc/aurora/site/source/blog/2017-11-10-aurora-0-19-0-released.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/blog/2017-11-10-aurora-0-19-0-released.md (added) +++ aurora/site/source/blog/2017-11-10-aurora-0-19-0-released.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,83 @@ +--- +layout: post +title: 0.19.0 Released +permalink: /blog/aurora-0-19-0-released/ +published: true +post_author: + display_name: Bill Farner +tags: Release +--- + +The latest Apache Aurora release, 0.19.0, is now available for +[download](http://aurora.apache.org/downloads/). Here are some highlights in this release: + +- Added the ability to configure the executor's stop timeout, which is the maximum amount of time + the executor will wait during a graceful shutdown sequence before continuing the 'Forceful + Termination' process (see + [here](http://aurora.apache.org/documentation/latest/reference/task-lifecycle/) for details). +- Added the ability to configure the wait period after calling the graceful shutdown endpoint and + the shutdown endpoint using the `graceful_shutdown_wait_secs` and `shutdown_wait_secs` fields in + `HttpLifecycleConfig` respectively. Previously, the executor would only wait 5 seconds between + steps (adding up to a total of 10 seconds as there are 2 steps). The overall waiting period is + bounded by the executor's stop timeout, which can be configured using the executor's + `stop_timeout_in_secs` flag. +- Added the `thrift_method_interceptor_modules` scheduler flag that lets cluster operators inject + custom Thrift method interceptors. +- Increase default ZooKeeper session timeout from 4 to 15 seconds. +- Added option `-zk_connection_timeout` to control the connection timeout of ZooKeeper connections. +- Added scheduler command line argument `-hold_offers_forever`, suitable for use in clusters where + Aurora is the only framework. This setting disables other options such as `-min_offer_hold_time`, + and allows the scheduler to more efficiently cache scheduling attempts. +- The scheduler no longer uses an internal H2 database for storage. +- There is a new Scheduler UI which, in addition to the facelift, provides the ability to inject your + own custom UI components. + +Deprecations and removals: + +- Removed the deprecated command line argument `-zk_use_curator`, removing the choice to use the + legacy ZooKeeper client. +- Removed the `rewriteConfigs` thrift API call in the scheduler. This was a last-ditch mechanism + to modify scheduler state on the fly. It was considered extremely risky to use since its + inception, and is safer to abandon due to its lack of use and likelihood for code rot. +- Removed the Job environment validation from the command line client. Validation was moved to the + the scheduler side through the `allowed_job_environments` option. By default allowing any of + `devel`, `test`, `production`, and any value matching the regular expression `staging[0-9]*`. +- Removed scheduler command line arguments related to the internal H2 database, which is no longer + used: + - `-use_beta_db_task_store` + - `-enable_db_metrics` + - `-slow_query_log_threshold` + - `-db_row_gc_interval` + - `-db_lock_timeout` + - `-db_max_active_connection_count` + - `-db_max_idle_connection_count` + - `-snapshot_hydrate_stores` + - `-enable_h2_console` + +Full release notes are available in the release +[CHANGELOG](https://git-wip-us.apache.org/repos/asf?p=aurora.git&f=CHANGELOG&hb=rel/0.19.0). + +## Thanks + +Thanks to the 14 contributors who made Apache Aurora 0.19.0 possible: + +* Bill Farner +* David McLaughlin +* Derek Slager +* Jordan Ly +* Kai Huang +* Keisuke Nishimoto +* Mauricio Garavaglia +* Renan DelValle +* Reza Motamedi +* Robert Allen +* Ruben D. Porras +* Santhosh Kumar Shanmugham +* Stephan Erb +* Zameer Manji + + + + + + Added: aurora/site/source/documentation/0.19.0/additional-resources/presentations.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/additional-resources/presentations.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/additional-resources/presentations.md (added) +++ aurora/site/source/documentation/0.19.0/additional-resources/presentations.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,80 @@ +# Apache Aurora Presentations +Video and slides from presentations and panel discussions about Apache Aurora. + +_(Listed in date descending order)_ + +<table> + + <tr> + <td><img src="/documentation/0.19.0/images/presentations/10_08_2015_mesos_aurora_on_a_small_scale_thumb.png" alt="Mesos and Aurora on a Small Scale Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=q5iIqhaCJ_o">Mesos & Aurora on a Small Scale (Video)</a></strong> + <p>Presented by Florian Pfeiffer</p> + <p>October 8, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon-europe">#MesosCon Europe 2015</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/10_08_2015_sla_aware_maintenance_for_operators_thumb.png" alt="SLA Aware Maintenance for Operators Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=tZ0-SISvCis">SLA Aware Maintenance for Operators (Video)</a></strong> + <p>Presented by Joe Smith</p> + <p>October 8, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon-europe">#MesosCon Europe 2015</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/09_20_2015_shipping_code_with_aurora_thumb.png" alt="Shipping Code with Aurora Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=y1hi7K1lPkk">Shipping Code with Aurora (Video)</a></strong> + <p>Presented by Bill Farner</p> + <p>August 20, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon">#MesosCon 2015</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/09_20_2015_twitter_production_scale_thumb.png" alt="Twitter Production Scale Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=nNrh-gdu9m4">Twitterâs Production Scale: Mesos and Aurora Operations (Video)</a></strong> + <p>Presented by Joe Smith</p> + <p>August 20, 2015 at <a href="http://events.linuxfoundation.org/events/archive/2015/mesoscon">#MesosCon 2015</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/04_30_2015_monolith_to_microservices_thumb.png" alt="From Monolith to Microservices with Aurora Video Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=yXkOgnyK4Hw">>From Monolith to Microservices w/ Aurora (Video)</a></strong> + <p>Presented by Thanos Baskous, Tony Dong, Dobromir Montauk</p> + <p>April 30, 2015 at <a href="http://www.meetup.com/Bay-Area-Apache-Aurora-Users-Group/events/221219480/">Bay Area Apache Aurora Users Group</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/03_07_2015_aurora_mesos_in_practice_at_twitter_thumb.png" alt="Aurora + Mesos in Practice at Twitter Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=1XYJGX_qZVU">Aurora + Mesos in Practice at Twitter (Video)</a></strong> + <p>Presented by Bill Farner</p> + <p>March 07, 2015 at <a href="http://www.bigeng.io/aurora-mesos-in-practice-at-twitter">Bigcommerce TechTalk</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/02_28_2015_apache_aurora_thumb.png" alt="Apache Auroraã®å§ããã Slideshow Thumbnail" /></td> + <td><strong><a href="http://www.slideshare.net/zembutsu/apache-aurora-introduction-and-tutorial-osc15tk">Apache Auroraã®å§ããã (Slides)</a></strong> + <p>Presented by Masahito Zembutsu</p> + <p>February 28, 2015 at <a href="http://www.ospn.jp/osc2015-spring/">Open Source Conference 2015 Tokyo Spring</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/02_19_2015_aurora_adopters_panel_thumb.png" alt="Apache Aurora Adopters Panel Video Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=2Jsj0zFdRlg">Apache Aurora Adopters Panel (Video)</a></strong> + <p>Panelists Ben Staffin, Josh Adams, Bill Farner, Berk Demir</p> + <p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/02_19_2015_aurora_at_twitter_thumb.png" alt="Operating Apache Aurora and Mesos at Twitter Video Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=E4lxX6epM_U">Operating Apache Aurora and Mesos at Twitter (Video)</a></strong> + <p>Presented by Joe Smith</p> + <p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/02_19_2015_aurora_at_tellapart_thumb.png" alt="Apache Aurora and Mesos at TellApart" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=ZZXtXLvTXAE">Apache Aurora and Mesos at TellApart (Video)</a></strong> + <p>Presented by Steve Niemitz</p> + <p>February 19, 2015 at <a href="http://www.meetup.com/Bay-Area-Mesos-User-Group/events/220279080/">Bay Area Mesos Users Group</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/08_21_2014_past_present_future_thumb.png" alt="Past, Present, and Future of the Aurora Scheduler Video Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=Dsc5CPhKs4o">Past, Present, and Future of the Aurora Scheduler (Video)</a></strong> + <p>Presented by Bill Farner</p> + <p>August 21, 2014 at <a href="http://events.linuxfoundation.org/events/archive/2014/mesoscon">#MesosCon 2014</a></p></td> + </tr> + <tr> + <td><img src="/documentation/0.19.0/images/presentations/03_25_2014_introduction_to_aurora_thumb.png" alt="Introduction to Apache Aurora Video Thumbnail" /></td> + <td><strong><a href="https://www.youtube.com/watch?v=asd_h6VzaJc">Introduction to Apache Aurora (Video)</a></strong> + <p>Presented by Bill Farner</p> + <p>March 25, 2014 at <a href="https://www.eventbrite.com/e/aurora-and-mesosframeworksmeetup-tickets-10850994617">Aurora and Mesos Frameworks Meetup</a></p></td> + </tr> +</table> Added: aurora/site/source/documentation/0.19.0/additional-resources/tools.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/additional-resources/tools.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/additional-resources/tools.md (added) +++ aurora/site/source/documentation/0.19.0/additional-resources/tools.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,24 @@ +# Tools + +Various tools integrate with Aurora. Is there a tool missing? Let us know, or submit a patch to add it! + +* Load-balancing technology used to direct traffic to services running on Aurora: + - [synapse](https://github.com/airbnb/synapse) based on HAProxy + - [aurproxy](https://github.com/tellapart/aurproxy) based on nginx + - [jobhopper](https://github.com/benley/aurora-jobhopper) performs HTTP redirects for easy developer and administrator access + +* RPC libraries that integrate with the Aurora's [service discovery mechanism](../../features/service-discovery/): + - [linkerd](https://linkerd.io/) RPC proxy + - [finagle](https://twitter.github.io/finagle) (Scala) + - [scales](https://github.com/steveniemitz/scales) (Python) + +* Monitoring: + - [collectd-aurora](https://github.com/zircote/collectd-aurora) for cluster monitoring using collectd + - [Prometheus Aurora exporter](https://github.com/tommyulfsparre/aurora_exporter) for cluster monitoring using Prometheus + - [Prometheus service discovery integration](http://prometheus.io/docs/operating/configuration/#zookeeper-serverset-sd-configurations-serverset_sd_config) for discovering and monitoring services running on Aurora + +* Packaging and deployment: + - [aurora-packaging](https://github.com/apache/aurora-packaging), the source of the official Aurora packages + +* Thrift Clients: + - [gorealis](https://github.com/rdelval/gorealis) for communicating with the scheduler using Go Added: aurora/site/source/documentation/0.19.0/contributing.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/contributing.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/contributing.md (added) +++ aurora/site/source/documentation/0.19.0/contributing.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,93 @@ +## Get the Source Code + +First things first, you'll need the source! The Aurora source is available from Apache git: + + git clone https://git-wip-us.apache.org/repos/asf/aurora + +Read the Style Guides +--------------------- +Aurora's codebase is primarily Java and Python and conforms to the Twitter Commons styleguides for +both languages. + +- [Java Style Guide](https://github.com/twitter/commons/blob/master/src/java/com/twitter/common/styleguide.md) +- [Python Style Guide](https://github.com/twitter/commons/blob/master/src/python/twitter/common/styleguide.md) + +## Find Something to Do + +There are issues in [Jira](https://issues.apache.org/jira/browse/AURORA) with the +["newbie" label](https://issues.apache.org/jira/issues/?jql=project%20%3D%20AURORA%20AND%20labels%20%3D%20newbie%20and%20resolution%3Dunresolved) +that are good starting places for new Aurora contributors; pick one of these and dive in! To assign +a task to yourself, first ask for your JIRA id to be whitelisted by either asking in IRC/Slack or by +emailing [email protected]. Once your JIRA account has been whitelisted you can assign tickets +to yourself. The next step is to prepare your patch and finally post it for review. + +## Getting your ReviewBoard Account + +Go to https://reviews.apache.org and create an account. + +## Setting up your ReviewBoard Environment + +Run `./rbt status`. The first time this runs it will bootstrap and you will be asked to login. +Subsequent runs will cache your login credentials. + +## Submitting a Patch for Review + +Post a review with `rbt`, fill out the fields in your browser and hit Publish. + + ./rbt post -o + +If you're unsure about who to add as a reviewer, you can default to adding Zameer Manji (zmanji) and +Joshua Cohen (jcohen). They will take care of finding an appropriate reviewer for the patch. + +Once you've done this, you probably want to mark the associated Jira issue as Reviewable. + +## Updating an Existing Review + +Incorporate review feedback, make some more commits, update your existing review, fill out the +fields in your browser and hit Publish. + + ./rbt post -o -r <RB_ID> + +## Getting Your Review Merged + +If you're not an Aurora committer, one of the committers will merge your change in as described +below. Generally, the last reviewer to give the review a 'Ship It!' will be responsible. + +### Merging Your Own Review (Committers) + +Once you have shipits from the right committers, merge your changes in a single commit and mark +the review as submitted. The typical workflow is: + + git checkout master + git pull origin master + ./rbt patch -c <RB_ID> # Verify the automatically-generated commit message looks sane, + # editing if necessary. + git show master # Verify everything looks sane + git push origin master + ./rbt close <RB_ID> + +Note that even if you're developing using feature branches you will not use `git merge` - each +commit will be an atomic change accompanied by a ReviewBoard entry. + +### Merging Someone Else's Review + +Sometimes you'll need to merge someone else's RB. The typical workflow for this is + + git checkout master + git pull origin master + ./rbt patch -c <RB_ID> + git show master # Verify everything looks sane, author is correct + git push origin master + +Note for committers: while we generally use the commit message generated by `./rbt patch` some +changes are often required: + +1. Ensure the the commit message does not exceed 100 characters per line. +2. Remove the "Testing Done" section. It's generally redundant (can be seen by checking the linked + review) or entirely irrelevant to the commit itself. + +## Cleaning Up + +Your patch has landed, congratulations! The last thing you'll want to do before moving on to your +next fix is to clean up your Jira and Reviewboard. The former of which should be marked as +"Resolved" while the latter should be marked as "Submitted". Added: aurora/site/source/documentation/0.19.0/development/client.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/client.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/client.md (added) +++ aurora/site/source/documentation/0.19.0/development/client.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,148 @@ +Developing the Aurora Client +============================ + +The client is written in Python, and uses the +[Pants](http://pantsbuild.github.io/python-readme.html) build tool. + + +Building and Testing +-------------------- + +Building and testing the client code are both done using Pants. The relevant targets to know about +are: + + * Build a client executable: `./pants binary src/main/python/apache/aurora/client:aurora` + * Test client code: `./pants test src/test/python/apache/aurora/client/cli:cli` + +If you want to build a source distribution of the client, you need to run `./build-support/release/make-python-sdists`. + + +Creating Custom Builds +---------------------- + +There are situations where you may want to plug in custom logic to the Client that may not be +applicable to the open source codebase. Rather than create a whole CLI from scratch, you can +easily create your own custom, drop-in replacement aurora.pex using the pants build tool. + +First, create an AuroraCommandLine implementation as an entry-point for registering customizations: + + from apache.aurora.client.cli.client import AuroraCommandLine + + class CustomAuroraCommandLine(AuroraCommandLine): + """Custom AuroraCommandLine for your needs""" + + @property + def name(self): + return "your-company-aurora" + + @classmethod + def get_description(cls): + return 'Your Company internal Aurora client command line' + + def __init__(self): + super(CustomAuroraCommandLine, self).__init__() + # Add custom plugins.. + self.register_plugin(YourCustomPlugin()) + + def register_nouns(self): + super(CustomAuroraCommandLine, self).register_nouns() + # You can even add new commands / sub-commands! + self.register_noun(YourStartUpdateProxy()) + self.register_noun(YourDeployWorkflowCommand()) + +Secondly, create a main entry point: + + def proxy_main(): + client = CustomAuroraCommandLine() + if len(sys.argv) == 1: + sys.argv.append("-h") + sys.exit(client.execute(sys.argv[1:])) + +Finally, you can wire everything up with a pants BUILD file in your project directory: + + python_binary( + name='aurora', + entry_point='your_company.aurora.client:proxy_main', + dependencies=[ + ':client_lib' + ] + ) + + python_library( + name='client_lib', + sources = [ + 'client.py', + 'custom_plugin.py', + 'custom_command.py', + ], + dependencies = [ + # The Apache Aurora client + # Any other dependencies for your custom code + ], + ) + +Using the same commands to build the client as above (but obviously pointing to this BUILD file +instead), you will have a drop-in replacement aurora.pex file with your customizations. + +Running/Debugging +------------------ + +For manually testing client changes against a cluster, we use [Vagrant](https://www.vagrantup.com/). +To start a virtual cluster, you need to install Vagrant, and then run `vagrant up` for the root of +the aurora workspace. This will create a vagrant host named "devcluster", with a Mesos master, a set +of Mesos agents, and an Aurora scheduler. + +If you have a change you would like to test in your local cluster, you'll rebuild the client: + + vagrant ssh -c 'aurorabuild client' + +Once this completes, the `aurora` command will reflect your changes. + + +Running/Debugging in PyCharm +----------------------------- + +It's possible to use PyCharm to run and debug both the client and client tests in an IDE. In order +to do this, first run: + + build-support/python/make-pycharm-virtualenv + +This script will configure a virtualenv with all of our Python requirements. Once the script +completes it will emit instructions for configuring PyCharm: + + Your PyCharm environment is now set up. You can open the project root + directory with PyCharm. + + Once the project is loaded: + - open project settings + - click 'Project Interpreter' + - click the cog in the upper-right corner + - click 'Add Local' + - select 'build-support/python/pycharm.venv/bin/python' + - click 'OK' + +### Running/Debugging Tests +After following these instructions, you should now be able to run/debug tests directly from the IDE +by right-clicking on a test (or test class) and choosing to run or debug: + +[](../images/debug-client-test.png) + +If you've set a breakpoint, you can see the run will now stop and let you debug: + +[](../images/debugging-client-test.png) + +### Running/Debugging the Client +Actually running and debugging the client is unfortunately a bit more complex. You'll need to create +a Run configuration: + +* Go to Run â Edit Configurations +* Click the + icon to add a new configuration. +* Choose python and name the configuration 'client'. +* Set the script path to `/your/path/to/aurora/src/main/python/apache/aurora/client/cli/client.py` +* Set the script parameters to the command you want to run (e.g. `job status <job key>`) +* Expand the Environment section and click the ellipsis to add a new environment variable +* Click the + at the bottom to add a new variable named AURORA_CONFIG_ROOT whose value is the + path where the your cluster configuration can be found. For example, to talk to the scheduler + running in the vagrant image, it would be set to `/your/path/to/aurora/examples/vagrant` (this + is the directory where our example clusters.json is found). +* You should now be able to run and debug this configuration! Added: aurora/site/source/documentation/0.19.0/development/committers-guide.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/committers-guide.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/committers-guide.md (added) +++ aurora/site/source/documentation/0.19.0/development/committers-guide.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,105 @@ +Committer's Guide +================= + +Information for official Apache Aurora committers. + +Setting up your email account +----------------------------- +Once your Apache ID has been set up you can configure your account and add ssh keys and setup an +email forwarding address at + + http://id.apache.org + +Additional instructions for setting up your new committer email can be found at + + http://www.apache.org/dev/user-email.html + +The recommended setup is to configure all services (mailing lists, JIRA, ReviewBoard) to send +emails to your @apache.org email address. + + +Creating a gpg key for releases +------------------------------- +In order to create a release candidate you will need a gpg key published to an external key server +and that key will need to be added to our KEYS file as well. + +1. Create a key: + + gpg --gen-key + +2. Add your gpg key to the Apache Aurora KEYS file: + + git clone https://git-wip-us.apache.org/repos/asf/aurora.git + (gpg --list-sigs <KEY ID> && gpg --armor --export <KEY ID>) >> KEYS + git add KEYS && git commit -m "Adding gpg key for <APACHE ID>" + ./rbt post -o -g + +3. Publish the key to an external key server: + + gpg --keyserver pgp.mit.edu --send-keys <KEY ID> + +4. Update the changes to the KEYS file to the Apache Aurora svn dist locations listed below: + + https://dist.apache.org/repos/dist/dev/aurora/KEYS + https://dist.apache.org/repos/dist/release/aurora/KEYS + +5. Add your key to git config for use with the release scripts: + + git config --global user.signingkey <KEY ID> + + +Creating a release +------------------ +The following will guide you through the steps to create a release candidate, vote, and finally an +official Apache Aurora release. Before starting your gpg key should be in the KEYS file and you +must have access to commit to the dist.a.o repositories. + +1. Ensure that all issues resolved for this release candidate are tagged with the correct Fix +Version in JIRA, the changelog script will use this to generate the CHANGELOG in step #2. +To assign the fix version: + + * Look up the [previous release date](https://issues.apache.org/jira/browse/aurora/?selectedTab=com.atlassian.jira.jira-projects-plugin:versions-panel). + * Query all issues resolved after that release date: `project = AURORA AND status in (resolved, Closed) and fixVersion is empty and resolutiondate >= "YYYY/MM/DD"` + * In the upper right corner of the query result, select Tools > Bulk Edit. + * Select all issues > edit issue > set 'Change Fix Version/s' to the release version. + * Make sure to uncheck 'Send mail for this update' at the bottom. + +2. Prepare RELEASE-NOTES.md for the release. This just boils down to removing the "(Not yet +released)" suffix from the impending release. + +2. Create a release candidate. This will automatically update the CHANGELOG and commit it, create a +branch and update the current version within the trunk. To create a minor version update and publish +it run + + ./build-support/release/release-candidate -l m -p + +3. Update, if necessary, the draft email created from the `release-candidate` script in step #2 and +send the [VOTE] email to the dev@ mailing list. You can verify the release signature and checksums +by running + + ./build-support/release/verify-release-candidate + +4. Wait for the vote to complete. If the vote fails close the vote by replying to the initial [VOTE] +email sent in step #3 by editing the subject to [RESULT][VOTE] ... and noting the failure reason +(example [here](http://markmail.org/message/d4d6xtvj7vgwi76f)). You'll also need to manually revert +the commits generated by the release candidate script that incremented the snapshot version and +updated the changelog. Once that is done, now address any issues and go back to step #1 and run +again, this time you will use the -r flag to increment the release candidate version. This will +automatically clean up the release candidate rc0 branch and source distribution. + + ./build-support/release/release-candidate -l m -r 1 -p + +5. Once the vote has successfully passed create the release + +**IMPORTANT: make sure to use the correct release at this final step (e.g.: `-r 1` if rc1 candidate +has been voted for). Once the release tag is pushed it will be very hard to undo due to remote +git pre-receive hook explicitly forbidding release tag manipulations.** + + ./build-support/release/release + +6. Update the draft email created fom the `release` script in step #5 to include the Apache ID's for +all binding votes and send the [RESULT][VOTE] email to the dev@ mailing list. + +7. Update the [Aurora Website](http://aurora.apache.org/) by following the +[instructions](https://svn.apache.org/repos/asf/aurora/site/README.md) on the ASF Aurora SVN repo. +Remember to add a blog post under source/blog and regenerate the site before committing. Added: aurora/site/source/documentation/0.19.0/development/db-migration.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/db-migration.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/db-migration.md (added) +++ aurora/site/source/documentation/0.19.0/development/db-migration.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,34 @@ +DB Migrations +============= + +Changes to the DB schema should be made in the form of migrations. This ensures that all changes +are applied correctly after a DB dump from a previous version is restored. + +DB migrations are managed through a system built on top of +[MyBatis Migrations](http://www.mybatis.org/migrations/). The migrations are run automatically when +a snapshot is restored, no manual interaction is required by cluster operators. + +Upgrades +-------- +When adding or altering tables or changing data, in addition to making to change in +[schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql), a new +migration class should be created under the org.apache.aurora.scheduler.storage.db.migration +package. The class should implement the [MigrationScript](https://github.com/mybatis/migrations/blob/master/src/main/java/org/apache/ibatis/migration/MigrationScript.java) +interface (see [V001_TestMigration](https://github.com/apache/aurora/blob/rel/0.19.0/src/test/java/org/apache/aurora/scheduler/storage/db/testmigration/V001_TestMigration.java) +as an example). The upgrade and downgrade scripts are defined in this class. When restoring a +snapshot the list of migrations on the classpath is compared to the list of applied changes in the +DB. Any changes that have not yet been applied are executed and their downgrade script is stored +alongside the changelog entry in the database to faciliate downgrades in the event of a rollback. + +Downgrades +---------- +If, while running migrations, a rollback is detected, i.e. a change exists in the DB changelog that +does not exist on the classpath, the downgrade script associated with each affected change is +applied. + +Baselines +--------- +After enough time has passed (at least 1 official release), it should be safe to baseline migrations +if desired. This can be accomplished by ensuring the changes from migrations have been applied to +[schema.sql](../../src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql) and then +removing the corresponding migration classes and adding a migration to remove the changelog entries. \ No newline at end of file Added: aurora/site/source/documentation/0.19.0/development/design-documents.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/design-documents.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/design-documents.md (added) +++ aurora/site/source/documentation/0.19.0/development/design-documents.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,23 @@ +Design Documents +================ + +Since its inception as an Apache project, larger feature additions to the +Aurora code base are discussed in form of design documents. Design documents +are living documents until a consensus has been reached to implement a feature +in the proposed form. + +Current and past documents: + +* [Command Hooks for the Aurora Client](../design/command-hooks/) +* [Dynamic Reservations](https://docs.google.com/document/d/19gV8Po6DIHO14tOC7Qouk8RnboY8UCfRTninwn_5-7c/edit) +* [GPU Resources in Aurora](https://docs.google.com/document/d/1J9SIswRMpVKQpnlvJAMAJtKfPP7ZARFknuyXl-2aZ-M/edit) +* [Health Checks for Updates](https://docs.google.com/document/d/1KOO0LC046k75TqQqJ4c0FQcVGbxvrn71E10wAjMorVY/edit) +* [JobUpdateDiff thrift API](https://docs.google.com/document/d/1Fc_YhhV7fc4D9Xv6gJzpfooxbK4YWZcvzw6Bd3qVTL8/edit) +* [REST API RFC](https://docs.google.com/document/d/11_lAsYIRlD5ETRzF2eSd3oa8LXAHYFD8rSetspYXaf4/edit) +* [Revocable Mesos offers in Aurora](https://docs.google.com/document/d/1r1WCHgmPJp5wbrqSZLsgtxPNj3sULfHrSFmxp2GyPTo/edit) +* [Supporting the Mesos Universal Containerizer](https://docs.google.com/document/d/111T09NBF2zjjl7HE95xglsDpRdKoZqhCRM5hHmOfTLA/edit?usp=sharing) +* [Tier Management In Apache Aurora](https://docs.google.com/document/d/1erszT-HsWf1zCIfhbqHlsotHxWUvDyI2xUwNQQQxLgs/edit?usp=sharing) +* [Ubiquitous Jobs](https://docs.google.com/document/d/12hr6GnUZU3mc7xsWRzMi3nQILGB-3vyUxvbG-6YmvdE/edit) +* [Pluggable Scheduling](https://docs.google.com/document/d/1fVHLt9AF-YbOCVCDMQmi5DATVusn-tqY8DldKbjVEm0/edit) + +Design documents can be found in the Aurora issue tracker via the query [`project = AURORA AND text ~ "docs.google.com" ORDER BY created`](https://issues.apache.org/jira/browse/AURORA-1528?jql=project%20%3D%20AURORA%20AND%20text%20~%20%22docs.google.com%22%20ORDER%20BY%20created). Added: aurora/site/source/documentation/0.19.0/development/design/command-hooks.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/design/command-hooks.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/design/command-hooks.md (added) +++ aurora/site/source/documentation/0.19.0/development/design/command-hooks.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,102 @@ +# Command Hooks for the Aurora Client + +## Introduction/Motivation + +We've got hooks in the client that surround API calls. These are +pretty awkward, because they don't correlate with user actions. For +example, suppose we wanted a policy that said users weren't allowed to +kill all instances of a production job at once. + +Right now, all that we could hook would be the "killJob" api call. But +kill (at least in newer versions of the client) normally runs in +batches. If a user called killall, what we would see on the API level +is a series of "killJob" calls, each of which specified a batch of +instances. We woudn't be able to distinguish between really killing +all instances of a job (which is forbidden under this policy), and +carefully killing in batches (which is permitted.) In each case, the +hook would just see a series of API calls, and couldn't find out what +the actual command being executed was! + +For most policy enforcement, what we really want to be able to do is +look at and vet the commands that a user is performing, not the API +calls that the client uses to implement those commands. + +So I propose that we add a new kind of hooks, which surround noun/verb +commands. A hook will register itself to handle a collection of (noun, +verb) pairs. Whenever any of those noun/verb commands are invoked, the +hooks methods will be called around the execution of the verb. A +pre-hook will have the ability to reject a command, preventing the +verb from being executed. + +## Registering Hooks + +These hooks will be registered via configuration plugins. A configuration plugin +can register hooks using an API. Hooks registered this way are, effectively, +hardwired into the client executable. + +The order of execution of hooks is unspecified: they may be called in +any order. There is no way to guarantee that one hook will execute +before some other hook. + + +### Global Hooks + +Commands registered by the python call are called _global_ hooks, +because they will run for all configurations, whether or not they +specify any hooks in the configuration file. + +In the implementation, hooks are registered in the module +`apache.aurora.client.cli.command_hooks`, using the class +`GlobalCommandHookRegistry`. A global hook can be registered by calling +`GlobalCommandHookRegistry.register_command_hook` in a configuration plugin. + +### The API + + class CommandHook(object) + @property + def name(self): + """Returns a name for the hook." + + def get_nouns(self): + """Return the nouns that have verbs that should invoke this hook.""" + + def get_verbs(self, noun): + """Return the verbs for a particular noun that should invoke his hook.""" + + @abstractmethod + def pre_command(self, noun, verb, context, commandline): + """Execute a hook before invoking a verb. + * noun: the noun being invoked. + * verb: the verb being invoked. + * context: the context object that will be used to invoke the verb. + The options object will be initialized before calling the hook + * commandline: the original argv collection used to invoke the client. + Returns: True if the command should be allowed to proceed; False if the command + should be rejected. + """ + + def post_command(self, noun, verb, context, commandline, result): + """Execute a hook after invoking a verb. + * noun: the noun being invoked. + * verb: the verb being invoked. + * context: the context object that will be used to invoke the verb. + The options object will be initialized before calling the hook + * commandline: the original argv collection used to invoke the client. + * result: the result code returned by the verb. + Returns: nothing + """ + + class GlobalCommandHookRegistry(object): + @classmethod + def register_command_hook(self, hook): + pass + +### Skipping Hooks + +To skip a hook, a user uses a command-line option, `--skip-hooks`. The option can either +specify specific hooks to skip, or "all": + +* `aurora --skip-hooks=all job create east/bozo/devel/myjob` will create a job + without running any hooks. +* `aurora --skip-hooks=test,iq create east/bozo/devel/myjob` will create a job, + and will skip only the hooks named "test" and "iq". Added: aurora/site/source/documentation/0.19.0/development/scheduler.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/scheduler.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/scheduler.md (added) +++ aurora/site/source/documentation/0.19.0/development/scheduler.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,118 @@ +Developing the Aurora Scheduler +=============================== + +The Aurora scheduler is written in Java code and built with [Gradle](http://gradle.org). + + +Prerequisite +============ + +When using Apache Aurora checked out from the source repository or the binary +distribution, the Gradle wrapper and JavaScript dependencies are provided. +However, you need to manually install them when using the source release +downloads: + +1. Install Gradle following the instructions on the [Gradle web site](http://gradle.org) +2. From the root directory of the Apache Aurora project generate the Gradle +wrapper by running: + + gradle wrapper + + +Getting Started +=============== + +You will need Java 8 installed and on your `PATH` or unzipped somewhere with `JAVA_HOME` set. Then + + ./gradlew tasks + +will bootstrap the build system and show available tasks. This can take a while the first time you +run it but subsequent runs will be much faster due to cached artifacts. + +Running the Tests +----------------- +Aurora has a comprehensive unit test suite. To run the tests use + + ./gradlew build + +Gradle will only re-run tests when dependencies of them have changed. To force a re-run of all +tests use + + ./gradlew clean build + +Running the build with code quality checks +------------------------------------------ +To speed up development iteration, the plain gradle commands will not run static analysis tools. +However, you should run these before posting a review diff, and **always** run this before pushing a +commit to origin/master. + + ./gradlew build -Pq + +Running integration tests +------------------------- +To run the same tests that are run in the Apache Aurora continuous integration +environment: + + ./build-support/jenkins/build.sh + +In addition, there is an end-to-end test that runs a suite of aurora commands +using a virtual cluster: + + ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh + +Creating a bundle for deployment +-------------------------------- +Gradle can create a zip file containing Aurora, all of its dependencies, and a launch script with + + ./gradlew distZip + +or a tar file containing the same files with + + ./gradlew distTar + +The output file will be written to `dist/distributions/aurora-scheduler.zip` or +`dist/distributions/aurora-scheduler.tar`. + + + +Developing Aurora Java code +=========================== + +Setting up an IDE +----------------- +Gradle can generate project files for your IDE. To generate an IntelliJ IDEA project run + + ./gradlew idea + +and import the generated `aurora.ipr` file. + +Adding or Upgrading a Dependency +-------------------------------- +New dependencies can be added from Maven central by adding a `compile` dependency to `build.gradle`. +For example, to add a dependency on `com.example`'s `example-lib` 1.0 add this block: + + compile 'com.example:example-lib:1.0' + +NOTE: Anyone thinking about adding a new dependency should first familiarize themselves with the +Apache Foundation's third-party licensing +[policy](http://www.apache.org/legal/resolved.html#category-x). + + + +Developing the Aurora Build System +================================== + +Bootstrapping Gradle +-------------------- +The following files were autogenerated by `gradle wrapper` using gradle's +[Wrapper](http://www.gradle.org/docs/current/dsl/org.gradle.api.tasks.wrapper.Wrapper.html) plugin and +should not be modified directly: + + ./gradlew + ./gradlew.bat + ./gradle/wrapper/gradle-wrapper.jar + ./gradle/wrapper/gradle-wrapper.properties + +To upgrade Gradle unpack the new version somewhere, run `/path/to/new/gradle wrapper` in the +repository root and commit the changed files. + Added: aurora/site/source/documentation/0.19.0/development/thermos.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/thermos.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/thermos.md (added) +++ aurora/site/source/documentation/0.19.0/development/thermos.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,126 @@ +The Python components of Aurora are built using [Pants](https://pantsbuild.github.io). + + +Python Build Conventions +======================== +The Python code is laid out according to the following conventions: + +1. 1 `BUILD` per 3rd level directory. For a list of current top-level packages run: + + % find src/main/python -maxdepth 3 -mindepth 3 -type d |\ + while read dname; do echo $dname |\ + sed 's@src/main/python/\(.*\)/\(.*\)/\(.*\).*@\1.\2.\3@'; done + +2. Each `BUILD` file exports 1 + [`python_library`](https://pantsbuild.github.io/build_dictionary.html#bdict_python_library) + that provides a + [`setup_py`](https://pantsbuild.github.io/build_dictionary.html#setup_py) + containing each + [`python_binary`](https://pantsbuild.github.io/build_dictionary.html#python_binary) + in the `BUILD` file, named the same as the directory it's in so that it can be referenced + without a ':' character. The `sources` field in the `python_library` will almost always be + `rglobs('*.py')`. + +3. Other BUILD files may only depend on this single public `python_library` + target. Any other target is considered a private implementation detail and + should be prefixed with an `_`. + +4. `python_binary` targets are always named the same as the exported console script. + +5. `python_binary` targets must have identical `dependencies` to the `python_library` exported + by the package and must use `entry_point`. + + The means a PEX file generated by pants will contain exactly the same files that will be + available on the `PYTHONPATH` in the case of `pip install` of the corresponding library + target. This will help our migration off of Pants in the future. + +Annotated example - apache.thermos.runner +----------------------------------------- + + % find src/main/python/apache/thermos/runner + src/main/python/apache/thermos/runner + src/main/python/apache/thermos/runner/__init__.py + src/main/python/apache/thermos/runner/thermos_runner.py + src/main/python/apache/thermos/runner/BUILD + % cat src/main/python/apache/thermos/runner/BUILD + # License boilerplate omitted + import os + + + # Private target so that a setup_py can exist without a circular dependency. Only targets within + # this file should depend on this. + python_library( + name = '_runner', + # The target covers every python file under this directory and subdirectories. + sources = rglobs('*.py'), + dependencies = [ + '3rdparty/python:twitter.common.app', + '3rdparty/python:twitter.common.log', + # Source dependencies are always referenced without a ':'. + 'src/main/python/apache/thermos/common', + 'src/main/python/apache/thermos/config', + 'src/main/python/apache/thermos/core', + ], + ) + + # Binary target for thermos_runner.pex. Nothing should depend on this - it's only used as an + # argument to ./pants binary. + python_binary( + name = 'thermos_runner', + # Use entry_point, not source so the files used here are the same ones tests see. + entry_point = 'apache.thermos.bin.thermos_runner', + dependencies = [ + # Notice that we depend only on the single private target from this BUILD file here. + ':_runner', + ], + ) + + # The public library that everyone importing the runner symbols uses. + # The test targets and any other dependent source code should depend on this. + python_library( + name = 'runner', + dependencies = [ + # Again, notice that we depend only on the single private target from this BUILD file here. + ':_runner', + ], + # We always provide a setup_py. This will cause any dependee libraries to automatically + # reference this library in their requirements.txt rather than copy the source files into their + # sdist. + provides = setup_py( + # Conventionally named and versioned. + name = 'apache.thermos.runner', + version = open(os.path.join(get_buildroot(), '.auroraversion')).read().strip().upper(), + ).with_binaries({ + # Every binary in this file should also be repeated here. + # Always use the dict-form of .with_binaries so that commands with dashes in their names are + # supported. + # The console script name is always the same as the PEX with .pex stripped. + 'thermos_runner': ':thermos_runner', + }), + ) + + + +Thermos Test resources +====================== + +The Aurora source repository and distributions contain several +[binary files](../../src/test/resources/org/apache/thermos/root/checkpoints) to +qualify the backwards-compatibility of thermos with checkpoint data. Since +thermos persists state to disk, to be read by the thermos observer), it is important that we have +tests that prevent regressions affecting the ability to parse previously-written data. + +The files included represent persisted checkpoints that exercise different +features of thermos. The existing files should not be modified unless +we are accepting backwards incompatibility, such as with a major release. + +It is not practical to write source code to generate these files on the fly, +as source would be vulnerable to drift (e.g. due to refactoring) in ways +that would undermine the goal of ensuring backwards compatibility. + +The most common reason to add a new checkpoint file would be to provide +coverage for new thermos features that alter the data format. This is +accomplished by writing and running a +[job configuration](../../reference/configuration/) that exercises the feature, and +copying the checkpoint file from the sandbox directory, by default this is +`/var/run/thermos/checkpoints/<aurora task id>`. Added: aurora/site/source/documentation/0.19.0/development/thrift.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/thrift.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/thrift.md (added) +++ aurora/site/source/documentation/0.19.0/development/thrift.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,54 @@ +Thrift +====== + +Aurora uses [Apache Thrift](https://thrift.apache.org/) for representing structured data in +client/server RPC protocol as well as for internal data storage. While Thrift is capable of +correctly handling additions and renames of the existing members, field removals must be done +carefully to ensure backwards compatibility and provide predictable deprecation cycle. This +document describes general guidelines for making Thrift schema changes to the existing fields in +[api.thrift](https://github.com/apache/aurora/blob/rel/0.19.0/api/src/main/thrift/org/apache/aurora/gen/api.thrift). + +It is highly recommended to go through the +[Thrift: The Missing Guide](http://diwakergupta.github.io/thrift-missing-guide/) first to refresh on +basic Thrift schema concepts. + +Checklist +--------- +Every existing Thrift schema modification is unique in its requirements and must be analyzed +carefully to identify its scope and expected consequences. The following checklist may help in that +analysis: +* Is this a new field/struct? If yes, go ahead +* Is this a pure field/struct rename without any type/structure change? If yes, go ahead and rename +* Anything else, read further to make sure your change is properly planned + +Deprecation cycle +----------------- +Any time a breaking change (e.g.: field replacement or removal) is required, the following cycle +must be followed: + +### vCurrent +Change is applied in a way that does not break scheduler/client with this version to +communicate with scheduler/client from vCurrent-1. +* Do not remove or rename the old field +* Add a new field as an eventual replacement of the old one and implement a dual read/write +anywhere the old field is used. If a thrift struct is mapped in the DB store make sure both columns +are marked as `NOT NULL` +* Check [storage.thrift](https://github.com/apache/aurora/blob/rel/0.19.0/api/src/main/thrift/org/apache/aurora/gen/storage.thrift) to see if +the affected struct is stored in Aurora scheduler storage. If so, it's almost certainly also +necessary to perform a [DB migration](../db-migration/). +* Add a deprecation jira ticket into the vCurrent+1 release candidate +* Add a TODO for the deprecated field mentioning the jira ticket + +### vCurrent+1 +Finalize the change by removing the deprecated fields from the Thrift schema. +* Drop any dual read/write routines added in the previous version +* Remove thrift backfilling in scheduler +* Remove the deprecated Thrift field + +Testing +------- +It's always advisable to test your changes in the local vagrant environment to build more +confidence that you change is backwards compatible. It's easy to simulate different +client/scheduler versions by playing with `aurorabuild` command. See [this document](../../getting-started/vagrant/) +for more. + Added: aurora/site/source/documentation/0.19.0/development/ui.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/development/ui.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/development/ui.md (added) +++ aurora/site/source/documentation/0.19.0/development/ui.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,46 @@ +Developing the Aurora Scheduler UI +================================== + +Installing bower (optional) +---------------------------- +Third party JS libraries used in Aurora (located at 3rdparty/javascript/bower_components) are +managed by bower, a JS dependency manager. Bower is only required if you plan to add, remove or +update JS libraries. Bower can be installed using the following command: + + npm install -g bower + +Bower depends on node.js and npm. The easiest way to install node on a mac is via brew: + + brew install node + +For more node.js installation options refer to https://github.com/joyent/node/wiki/Installation. + +More info on installing and using bower can be found at: http://bower.io/. Once installed, you can +use the following commands to view and modify the bower repo at +3rdparty/javascript/bower_components + + bower list + bower install <library name> + bower remove <library name> + bower update <library name> + bower help + + +Faster Iteration in Vagrant +--------------------------- +The scheduler serves UI assets from the classpath. For production deployments this means the assets +are served from within a jar. However, for faster development iteration, the vagrant image is +configured to add the `scheduler` subtree of `/vagrant/dist/resources/main` to the head of +`CLASSPATH`. This path is configured as a shared filesystem to the path on the host system where +your Aurora repository lives. This means that any updates under `dist/resources/main/scheduler` in +your checkout will be reflected immediately in the UI served from within the vagrant image. + +The one caveat to this is that this path is under `dist` not `src`. This is because the assets must +be processed by gradle before they can be served. So, unfortunately, you cannot just save your local +changes and see them reflected in the UI, you must first run `./gradlew processResources`. This is +less than ideal, but better than having to restart the scheduler after every change. Additionally, +gradle makes this process somewhat easier with the use of the `--continuous` flag. If you run: +`./gradlew processResources --continuous` gradle will monitor the filesystem for changes and run the +task automatically as necessary. This doesn't quite provide hot-reload capabilities, but it does +allow for <5s from save to changes being visibile in the UI with no further action required on the +part of the developer. Added: aurora/site/source/documentation/0.19.0/features/constraints.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/features/constraints.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/features/constraints.md (added) +++ aurora/site/source/documentation/0.19.0/features/constraints.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,126 @@ +Scheduling Constraints +====================== + +By default, Aurora will pick any random agent with sufficient resources +in order to schedule a task. This scheduling choice can be further +restricted with the help of constraints. + + +Mesos Attributes +---------------- + +Data centers are often organized with hierarchical failure domains. Common failure domains +include hosts, racks, rows, and PDUs. If you have this information available, it is wise to tag +the Mesos agent with them as +[attributes](https://mesos.apache.org/documentation/attributes-resources/). + +The Mesos agent `--attributes` command line argument can be used to mark agents with +static key/value pairs, so called attributes (not to be confused with `--resources`, which are +dynamic and accounted). + +For example, consider the host `cluster1-aaa-03-sr2` and its following attributes (given in +key:value format): `host:cluster1-aaa-03-sr2` and `rack:aaa`. + +Aurora makes these attributes available for matching with scheduling constraints. + + +Limit Constraints +----------------- + +Limit constraints allow to control machine diversity using constraints. The below +constraint ensures that no more than two instances of your job may run on a single host. +Think of this as a "group by" limit. + + Service( + name = 'webservice', + role = 'www-data', + constraints = { + 'host': 'limit:2', + } + ... + ) + + +Likewise, you can use constraints to control rack diversity, e.g. at +most one task per rack: + + constraints = { + 'rack': 'limit:1', + } + +Use these constraints sparingly as they can dramatically reduce Tasks' schedulability. +Further details are available in the reference documentation on +[Scheduling Constraints](../../reference/configuration/#specifying-scheduling-constraints). + + + +Value Constraints +----------------- + +Value constraints can be used to express that a certain attribute with a certain value +should be present on a Mesos agent. For example, the following job would only be +scheduled on nodes that claim to have an `SSD` as their disk. + + Service( + name = 'webservice', + role = 'www-data', + constraints = { + 'disk': 'SSD', + } + ... + ) + + +Further details are available in the reference documentation on +[Scheduling Constraints](../../reference/configuration/#specifying-scheduling-constraints). + + +Running stateful services +------------------------- + +Aurora is best suited to run stateless applications, but it also accommodates for stateful services +like databases, or services that otherwise need to always run on the same machines. + +### Dedicated attribute + +Most of the Mesos attributes arbitrary and available for custom use. There is one exception, +though: the `dedicated` attribute. Aurora treats this specially, and only allows matching jobs to +run on these machines, and will only schedule matching jobs on these machines. + + +#### Syntax +The dedicated attribute has semantic meaning. The format is `$role(/.*)?`. When a job is created, +the scheduler requires that the `$role` component matches the `role` field in the job +configuration, and will reject the job creation otherwise. The remainder of the attribute is +free-form. We've developed the idiom of formatting this attribute as `$role/$job`, but do not +enforce this. For example: a job `devcluster/www-data/prod/hello` with a dedicated constraint set as +`www-data/web.multi` will have its tasks scheduled only on Mesos agents configured with: +`--attributes=dedicated:www-data/web.multi`. + +A wildcard (`*`) may be used for the role portion of the dedicated attribute, which will allow any +owner to elect for a job to run on the host(s). For example: tasks from both +`devcluster/www-data/prod/hello` and `devcluster/vagrant/test/hello` with a dedicated constraint +formatted as `*/web.multi` will be scheduled only on Mesos agents configured with +`--attributes=dedicated:*/web.multi`. This may be useful when assembling a virtual cluster of +machines sharing the same set of traits or requirements. + +##### Example +Consider the following agent command line: + + mesos-slave --attributes="dedicated:db_team/redis" ... + +And this job configuration: + + Service( + name = 'redis', + role = 'db_team', + constraints = { + 'dedicated': 'db_team/redis' + } + ... + ) + +The job configuration is indicating that it should only be scheduled on agents with the attribute +`dedicated:db_team/redis`. Additionally, Aurora will prevent any tasks that do _not_ have that +constraint from running on those agents. + Added: aurora/site/source/documentation/0.19.0/features/containers.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/features/containers.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/features/containers.md (added) +++ aurora/site/source/documentation/0.19.0/features/containers.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,130 @@ +Containers +========== + +Aurora supports several containerizers, notably the Mesos containerizer and the Docker +containerizer. The Mesos containerizer uses native OS features directly to provide isolation between +containers, while the Docker containerizer delegates container management to the Docker engine. + +The support for launching container images via both containerizers has to be +[enabled by a cluster operator](../../operations/configuration/#containers). + +Mesos Containerizer +------------------- + +The Mesos containerizer is the native Mesos containerizer solution. It allows tasks to be +run with an array of [pluggable isolators](../resource-isolation/) and can launch tasks using +[Docker](https://github.com/docker/docker/blob/master/image/spec/v1.md) images, +[AppC](https://github.com/appc/spec/blob/master/SPEC.md) images, or directly on the agent host +filesystem. + +The following example (available in our [Vagrant environment](../../getting-started/vagrant/)) +launches a hello world example within a `debian/jessie` Docker image: + + $ cat /vagrant/examples/jobs/hello_docker_image.aurora + hello_loop = Process( + name = 'hello', + cmdline = """ + while true; do + echo hello world + sleep 10 + done + """) + + task = Task( + processes = [hello_loop], + resources = Resources(cpu=1, ram=1*MB, disk=8*MB) + ) + + jobs = [ + Service( + cluster = 'devcluster', + environment = 'devel', + role = 'www-data', + name = 'hello_docker_image', + task = task, + container = Mesos(image=DockerImage(name='debian', tag='jessie')) + ) + ] + +Docker and Appc images are designated using an appropriate `image` property of the `Mesos` +configuration object. If either `container` or `image` is left unspecified, the host filesystem +will be used. Further details of how to specify images can be found in the +[Reference Documentation](../../reference/configuration/#mesos-object). + +By default, Aurora launches processes as the Linux user named like the used role (e.g. `www-data` +in the example above). This user has to exist on the host filesystem. If it does not exist within +the container image, it will be created automatically. Otherwise, this user and its primary group +has to exist in the image with matching uid/gid. + +For more information on the Mesos containerizer filesystem, namespace, and isolator features, visit +[Mesos Containerizer](http://mesos.apache.org/documentation/latest/mesos-containerizer/) and +[Mesos Container Images](http://mesos.apache.org/documentation/latest/container-image/). + + +Docker Containerizer +-------------------- + +The Docker containerizer launches container images using the Docker engine. It may often provide +more advanced features than the native Mesos containerizer, but has to be installed separately to +Mesos on each agent host. + +Starting with the 0.17.0 release, `image` can be specified with a `{{docker.image[name][tag]}}` binder so that +the tag can be resolved to a concrete image digest. This ensures that the job always uses the same image +across restarts, even if the version identified by the tag has been updated, guaranteeing that only job +updates can mutate configuration. + +Example (available in the [Vagrant environment](../../getting-started/vagrant/)): + + $ cat /vagrant/examples/jobs/hello_docker_engine.aurora + hello_loop = Process( + name = 'hello', + cmdline = """ + while true; do + echo hello world + sleep 10 + done + """) + + task = Task( + processes = [hello_loop], + resources = Resources(cpu=1, ram=1*MB, disk=8*MB) + ) + + jobs = [ + Service( + cluster = 'devcluster', + environment = 'devel', + role = 'www-data', + name = 'hello_docker', + task = task, + container = Docker(image = 'python:2.7') + ), Service( + cluster = 'devcluster', + environment = 'devel', + role = 'www-data', + name = 'hello_docker_engine_binding', + task = task, + container = Docker(image = '{{docker.image[library/python][2.7]}}') + ) + ] + +Note, this feature requires a v2 Docker registry. If using a private Docker registry its url +must be specified in the `clusters.json` configuration file under the key `docker_registry`. +If not specified `docker_registry` defaults to `https://registry-1.docker.io` (Docker Hub). + +Example: + # clusters.json + [{ + "name": "devcluster", + ... + "docker_registry": "https://registry.example.com" + }] + +Details of how to use Docker via the Docker engine can be found in the +[Reference Documentation](../../reference/configuration/#docker-object). Please note that in order to +correctly execute processes inside a job, the Docker container must have Python 2.7 and potentitally +further Mesos dependencies installed. This limitation does not hold for Docker containers used via +the Mesos containerizer. + +For more information on launching Docker containers through the Docker containerizer, visit +[Docker Containerizer](http://mesos.apache.org/documentation/latest/docker-containerizer/) Added: aurora/site/source/documentation/0.19.0/features/cron-jobs.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/0.19.0/features/cron-jobs.md?rev=1814961&view=auto ============================================================================== --- aurora/site/source/documentation/0.19.0/features/cron-jobs.md (added) +++ aurora/site/source/documentation/0.19.0/features/cron-jobs.md Sat Nov 11 16:49:46 2017 @@ -0,0 +1,124 @@ +# Cron Jobs + +Aurora supports execution of scheduled jobs on a Mesos cluster using cron-style syntax. + +- [Overview](#overview) +- [Collision Policies](#collision-policies) +- [Failure recovery](#failure-recovery) +- [Interacting with cron jobs via the Aurora CLI](#interacting-with-cron-jobs-via-the-aurora-cli) + - [cron schedule](#cron-schedule) + - [cron deschedule](#cron-deschedule) + - [cron start](#cron-start) + - [job killall, job restart, job kill](#job-killall-job-restart-job-kill) +- [Technical Note About Syntax](#technical-note-about-syntax) +- [Caveats](#caveats) + - [Failovers](#failovers) + - [Collision policy is best-effort](#collision-policy-is-best-effort) + - [Timezone Configuration](#timezone-configuration) + +## Overview + +A job is identified as a cron job by the presence of a +`cron_schedule` attribute containing a cron-style schedule in the +[`Job`](../../reference/configuration/#job-objects) object. Examples of cron schedules +include "every 5 minutes" (`*/5 * * * *`), "Fridays at 17:00" (`* 17 * * FRI`), and +"the 1st and 15th day of the month at 03:00" (`0 3 1,15 *`). + +Example (available in the [Vagrant environment](../../getting-started/vagrant/)): + + $ cat /vagrant/examples/jobs/cron_hello_world.aurora + # A cron job that runs every 5 minutes. + jobs = [ + Job( + cluster = 'devcluster', + role = 'www-data', + environment = 'test', + name = 'cron_hello_world', + cron_schedule = '*/5 * * * *', + task = SimpleTask( + 'cron_hello_world', + 'echo "Hello world from cron, the time is now $(date --rfc-822)"'), + ), + ] + +## Collision Policies + +The `cron_collision_policy` field specifies the scheduler's behavior when a new cron job is +triggered while an older run hasn't finished. The scheduler has two policies available: + +* `KILL_EXISTING`: The default policy - on a collision the old instances are killed and a instances with the current +configuration are started. +* `CANCEL_NEW`: On a collision the new run is cancelled. + +Note that the use of `CANCEL_NEW` is likely a code smell - interrupted cron jobs should be able +to recover their progress on a subsequent invocation, otherwise they risk having their work queue +grow faster than they can process it. + +## Failure recovery + +Unlike with services, which aurora will always re-execute regardless of exit status, instances of +cron jobs retry according to the `max_task_failures` attribute of the +[Task](../../reference/configuration/#task-object) object. To get "run-until-success" semantics, +set `max_task_failures` to `-1`. + +## Interacting with cron jobs via the Aurora CLI + +Most interaction with cron jobs takes place using the `cron` subcommand. See `aurora cron -h` +for up-to-date usage instructions. + +### cron schedule +Schedules a new cron job on the Aurora cluster for later runs or replaces the existing cron template +with a new one. Only future runs will be affected, any existing active tasks are left intact. + + $ aurora cron schedule devcluster/www-data/test/cron_hello_world /vagrant/examples/jobs/cron_hello_world.aurora + +### cron deschedule +Deschedules a cron job, preventing future runs but allowing current runs to complete. + + $ aurora cron deschedule devcluster/www-data/test/cron_hello_world + +### cron start +Start a cron job immediately, outside of its normal cron schedule. + + $ aurora cron start devcluster/www-data/test/cron_hello_world + +### job killall, job restart, job kill +Cron jobs create instances running on the cluster that you can interact with like normal Aurora +tasks with `job kill` and `job restart`. + + +## Technical Note About Syntax + +`cron_schedule` uses a restricted subset of BSD crontab syntax. While the +execution engine currently uses Quartz, the schedule parsing is custom, a subset of FreeBSD +[crontab(5)](http://www.freebsd.org/cgi/man.cgi?crontab(5)) syntax. See +[the source](https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/cron/CrontabEntry.java#L106-L124) +for details. + + +## Caveats + +### Failovers +No failover recovery. Aurora does not record the latest minute it fired +triggers for across failovers. Therefore it's possible to miss triggers +on failover. Note that this behavior may change in the future. + +It's necessary to sync time between schedulers with something like `ntpd`. +Clock skew could cause double or missed triggers in the case of a failover. + +### Collision policy is best-effort +Aurora aims to always have *at least one copy* of a given instance running at a time - it's +an AP system, meaning it chooses Availability and Partition Tolerance at the expense of +Consistency. + +If your collision policy was `CANCEL_NEW` and a task has terminated but +Aurora has not noticed this Aurora will go ahead and create your new +task. + +If your collision policy was `KILL_EXISTING` and a task was marked `LOST` +but not yet GCed Aurora will go ahead and create your new task without +attempting to kill the old one (outside the GC interval). + +### Timezone Configuration +Cron timezone is configured indepdendently of JVM timezone with the `-cron_timezone` flag and +defaults to UTC.
