http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/28a3eb60/license.html ---------------------------------------------------------------------- diff --git a/license.html b/license.html index dd551bd..32e6ae2 100644 --- a/license.html +++ b/license.html @@ -13,6 +13,8 @@ + + @@ -81,7 +83,10 @@ - <ul class="current"> + + + + <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="project.html">Project</a></li> <li class="toctree-l1 current"><a class="current reference internal" href="#">License</a></li> <li class="toctree-l1"><a class="reference internal" href="start.html">Quick Start</a></li>
http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/28a3eb60/objects.inv ---------------------------------------------------------------------- diff --git a/objects.inv b/objects.inv index 6723914..f884cd0 100644 Binary files a/objects.inv and b/objects.inv differ http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/28a3eb60/plugins.html ---------------------------------------------------------------------- diff --git a/plugins.html b/plugins.html index be1cfbb..c956b7b 100644 --- a/plugins.html +++ b/plugins.html @@ -13,6 +13,8 @@ + + @@ -81,7 +83,10 @@ - <ul class="current"> + + + + <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="project.html">Project</a></li> <li class="toctree-l1"><a class="reference internal" href="license.html">License</a></li> <li class="toctree-l1"><a class="reference internal" href="start.html">Quick Start</a></li> @@ -175,7 +180,7 @@ features to its core by simply dropping files in your <code class="docutils literal"><span class="pre">$AIRFLOW_HOME/plugins</span></code> folder.</p> <p>The python modules in the <code class="docutils literal"><span class="pre">plugins</span></code> folder get imported, and <strong>hooks</strong>, <strong>operators</strong>, <strong>macros</strong>, <strong>executors</strong> and web <strong>views</strong> -get integrated to Airflow’s main collections and become available for use.</p> +get integrated to Airflowâs main collections and become available for use.</p> <div class="section" id="what-for"> <h2>What for?<a class="headerlink" href="#what-for" title="Permalink to this headline">¶</a></h2> <p>Airflow offers a generic toolbox for working with data. Different @@ -184,16 +189,16 @@ plugins can be a way for companies to customize their Airflow installation to reflect their ecosystem.</p> <p>Plugins can be used as an easy way to write, share and activate new sets of features.</p> -<p>There’s also a need for a set of more complex applications to interact with +<p>Thereâs also a need for a set of more complex applications to interact with different flavors of data and metadata.</p> <p>Examples:</p> <ul class="simple"> -<li>A set of tools to parse Hive logs and expose Hive metadata (CPU /IO / phases/ skew /...)</li> +<li>A set of tools to parse Hive logs and expose Hive metadata (CPU /IO / phases/ skew /â¦)</li> <li>An anomaly detection framework, allowing people to collect metrics, set thresholds and alerts</li> <li>An auditing tool, helping understand who accesses what</li> <li>A config-driven SLA monitoring tool, allowing you to set monitored tables and at what time they should land, alert people, and expose visualizations of outages</li> -<li>...</li> +<li>â¦</li> </ul> </div> <div class="section" id="why-build-on-top-of-airflow"> @@ -204,7 +209,7 @@ they should land, alert people, and expose visualizations of outages</li> <li>A metadata database to store your models</li> <li>Access to your databases, and knowledge of how to connect to them</li> <li>An array of workers that your application can push workload to</li> -<li>Airflow is deployed, you can just piggy back on it’s deployment logistics</li> +<li>Airflow is deployed, you can just piggy back on itâs deployment logistics</li> <li>Basic charting capabilities, underlying libraries and abstractions</li> </ul> </div> @@ -212,7 +217,7 @@ they should land, alert people, and expose visualizations of outages</li> <h2>Interface<a class="headerlink" href="#interface" title="Permalink to this headline">¶</a></h2> <p>To create a plugin you will need to derive the <code class="docutils literal"><span class="pre">airflow.plugins_manager.AirflowPlugin</span></code> class and reference the objects -you want to plug into Airflow. Here’s what the class you need to derive +you want to plug into Airflow. Hereâs what the class you need to derive looks like:</p> <div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">AirflowPlugin</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span> <span class="c1"># The name of your plugin (str)</span> @@ -277,7 +282,7 @@ definitions in Airflow.</p> <span class="c1"># Creating a flask blueprint to intergrate the templates and static folder</span> <span class="n">bp</span> <span class="o">=</span> <span class="n">Blueprint</span><span class="p">(</span> - <span class="s2">"test_plugin"</span><span class="p">,</span> <span class="n">__name__</span><span class="p">,</span> + <span class="s2">"test_plugin"</span><span class="p">,</span> <span class="vm">__name__</span><span class="p">,</span> <span class="n">template_folder</span><span class="o">=</span><span class="s1">'templates'</span><span class="p">,</span> <span class="c1"># registers airflow/plugins/templates as a Jinja template folder</span> <span class="n">static_folder</span><span class="o">=</span><span class="s1">'static'</span><span class="p">,</span> <span class="n">static_url_path</span><span class="o">=</span><span class="s1">'/static/test_plugin'</span><span class="p">)</span> http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/28a3eb60/profiling.html ---------------------------------------------------------------------- diff --git a/profiling.html b/profiling.html index 350c015..9438031 100644 --- a/profiling.html +++ b/profiling.html @@ -13,6 +13,8 @@ + + @@ -81,7 +83,10 @@ - <ul class="current"> + + + + <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="project.html">Project</a></li> <li class="toctree-l1"><a class="reference internal" href="license.html">License</a></li> <li class="toctree-l1"><a class="reference internal" href="start.html">Quick Start</a></li> @@ -186,12 +191,12 @@ connections registered in Airflow.</p> <h2>Charts<a class="headerlink" href="#charts" title="Permalink to this headline">¶</a></h2> <p>A simple UI built on top of flask-admin and highcharts allows building data visualizations and charts easily. Fill in a form with a label, SQL, -chart type, pick a source database from your environment’s connectons, +chart type, pick a source database from your environmentâs connectons, select a few other options, and save it for later use.</p> <p>You can even use the same templating and macros available when writing airflow pipelines, parameterizing your queries and modifying parameters directly in the URL.</p> -<p>These charts are basic, but they’re easy to create, modify and share.</p> +<p>These charts are basic, but theyâre easy to create, modify and share.</p> <div class="section" id="chart-screenshot"> <h3>Chart Screenshot<a class="headerlink" href="#chart-screenshot" title="Permalink to this headline">¶</a></h3> <img alt="_images/chart.png" src="_images/chart.png" /> http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/28a3eb60/project.html ---------------------------------------------------------------------- diff --git a/project.html b/project.html index 1bc09d0..a60697e 100644 --- a/project.html +++ b/project.html @@ -13,6 +13,8 @@ + + @@ -81,7 +83,10 @@ - <ul class="current"> + + + + <ul class="current"> <li class="toctree-l1 current"><a class="current reference internal" href="#">Project</a><ul> <li class="toctree-l2"><a class="reference internal" href="#history">History</a></li> <li class="toctree-l2"><a class="reference internal" href="#committers">Committers</a></li> @@ -175,13 +180,13 @@ <p>Airflow was started in October 2014 by Maxime Beauchemin at Airbnb. It was open source from the very first commit and officially brought under the Airbnb Github and announced in June 2015.</p> -<p>The project joined the Apache Software Foundation’s incubation program in March 2016.</p> +<p>The project joined the Apache Software Foundationâs incubation program in March 2016.</p> </div> <div class="section" id="committers"> <h2>Committers<a class="headerlink" href="#committers" title="Permalink to this headline">¶</a></h2> <ul class="simple"> -<li>@mistercrunch (Maxime “Max” Beauchemin)</li> -<li>@r39132 (Siddharth “Sid” Anand)</li> +<li>@mistercrunch (Maxime âMaxâ Beauchemin)</li> +<li>@r39132 (Siddharth âSidâ Anand)</li> <li>@criccomini (Chris Riccomini)</li> <li>@bolkedebruin (Bolke de Bruin)</li> <li>@artwr (Arthur Wiedmer)</li> @@ -190,18 +195,18 @@ the Airbnb Github and announced in June 2015.</p> <li>@aoen (Dan Davydov)</li> <li>@syvineckruyk (Steven Yvinec-Kruyk)</li> </ul> -<p>For the full list of contributors, take a look at <a class="reference external" href="https://github.com/apache/incubator-airflow/graphs/contributors">Airflow’s Github +<p>For the full list of contributors, take a look at <a class="reference external" href="https://github.com/apache/incubator-airflow/graphs/contributors">Airflowâs Github Contributor page:</a></p> </div> <div class="section" id="resources-links"> <h2>Resources & links<a class="headerlink" href="#resources-links" title="Permalink to this headline">¶</a></h2> <ul class="simple"> -<li><a class="reference external" href="http://airflow.apache.org/">Airflow’s official documentation</a></li> +<li><a class="reference external" href="http://airflow.apache.org/">Airflowâs official documentation</a></li> <li>Mailing list (send emails to <code class="docutils literal"><span class="pre">dev-subscribe@airflow.incubator.apache.org</span></code> and/or <code class="docutils literal"><span class="pre">commits-subscribe@airflow.incubator.apache.org</span></code> to subscribe to each)</li> -<li><a class="reference external" href="https://issues.apache.org/jira/browse/AIRFLOW">Issues on Apache’s Jira</a></li> +<li><a class="reference external" href="https://issues.apache.org/jira/browse/AIRFLOW">Issues on Apacheâs Jira</a></li> <li><a class="reference external" href="https://gitter.im/airbnb/airflow">Gitter (chat) Channel</a></li> <li><a class="reference external" href="https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Links">More resources and links to Airflow related content on the Wiki</a></li> </ul> http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/28a3eb60/py-modindex.html ---------------------------------------------------------------------- diff --git a/py-modindex.html b/py-modindex.html index 3f84d0f..4e57fc8 100644 --- a/py-modindex.html +++ b/py-modindex.html @@ -13,6 +13,8 @@ + + @@ -82,7 +84,10 @@ - <ul> + + + + <ul> <li class="toctree-l1"><a class="reference internal" href="project.html">Project</a></li> <li class="toctree-l1"><a class="reference internal" href="license.html">License</a></li> <li class="toctree-l1"><a class="reference internal" href="start.html">Quick Start</a></li> http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/28a3eb60/scheduler.html ---------------------------------------------------------------------- diff --git a/scheduler.html b/scheduler.html index 3f3aa9a..18f7a67 100644 --- a/scheduler.html +++ b/scheduler.html @@ -13,6 +13,8 @@ + + @@ -81,7 +83,10 @@ - <ul class="current"> + + + + <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="project.html">Project</a></li> <li class="toctree-l1"><a class="reference internal" href="license.html">License</a></li> <li class="toctree-l1"><a class="reference internal" href="start.html">Quick Start</a></li> @@ -183,7 +188,7 @@ execute <code class="docutils literal"><span class="pre">airflow</span> <span cl the run stamped <code class="docutils literal"><span class="pre">2016-01-01</span></code> will be trigger soon after <code class="docutils literal"><span class="pre">2016-01-01T23:59</span></code>. In other words, the job instance is started once the period it covers has ended.</p> -<p><strong>Let’s Repeat That</strong> The scheduler runs your job one <code class="docutils literal"><span class="pre">schedule_interval</span></code> AFTER the +<p><strong>Letâs Repeat That</strong> The scheduler runs your job one <code class="docutils literal"><span class="pre">schedule_interval</span></code> AFTER the start date, at the END of the period.</p> <p>The scheduler starts an instance of the executor specified in the your <code class="docutils literal"><span class="pre">airflow.cfg</span></code>. If it happens to be the <code class="docutils literal"><span class="pre">LocalExecutor</span></code>, tasks will be @@ -201,7 +206,7 @@ created. <code class="docutils literal"><span class="pre">schedule_interval</spa preferably a <a class="reference external" href="https://en.wikipedia.org/wiki/Cron#CRON_expression">cron expression</a> as a <code class="docutils literal"><span class="pre">str</span></code>, or a <code class="docutils literal"><span class="pre">datetime.timedelta</span></code> object. Alternatively, you can also -use one of these cron “preset”:</p> +use one of these cron âpresetâ:</p> <table border="1" class="docutils"> <colgroup> <col width="15%" /> @@ -216,7 +221,7 @@ use one of these cron “preset”:</p> </thead> <tbody valign="top"> <tr class="row-even"><td><code class="docutils literal"><span class="pre">None</span></code></td> -<td>Don’t schedule, use for exclusively “externally triggered” +<td>Donât schedule, use for exclusively âexternally triggeredâ DAGs</td> <td> </td> </tr> @@ -263,17 +268,41 @@ series of intervals which the scheduler turn into individual Dag Runs and execut Airflow is that these DAG Runs are atomic, idempotent items, and the scheduler, by default, will examine the lifetime of the DAG (from start to end/now, one interval at a time) and kick off a DAG Run for any interval that has not been run (or has been cleared). This concept is called Catchup.</p> -<p>If your DAG is written to handle it’s own catchup (IE not limited to the interval, but instead to “Now” +<p>If your DAG is written to handle itâs own catchup (IE not limited to the interval, but instead to âNowâ for instance.), then you will want to turn catchup off (Either on the DAG itself with <code class="docutils literal"><span class="pre">dag.catchup</span> <span class="pre">=</span> <span class="pre">False</span></code>) or by default at the configuration file level with <code class="docutils literal"><span class="pre">catchup_by_default</span> <span class="pre">=</span> <span class="pre">False</span></code>. What this will do, is to instruct the scheduler to only create a DAG Run for the most current instance of the DAG interval series.</p> +<div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="sd">"""</span> +<span class="sd">Code that goes along with the Airflow tutorial located at:</span> +<span class="sd">https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py</span> +<span class="sd">"""</span> +<span class="kn">from</span> <span class="nn">airflow</span> <span class="k">import</span> <span class="n">DAG</span> +<span class="kn">from</span> <span class="nn">airflow.operators.bash_operator</span> <span class="k">import</span> <span class="n">BashOperator</span> +<span class="kn">from</span> <span class="nn">datetime</span> <span class="k">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timedelta</span> + + +<span class="n">default_args</span> <span class="o">=</span> <span class="p">{</span> + <span class="s1">'owner'</span><span class="p">:</span> <span class="s1">'airflow'</span><span class="p">,</span> + <span class="s1">'depends_on_past'</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span> + <span class="s1">'start_date'</span><span class="p">:</span> <span class="n">datetime</span><span class="p">(</span><span class="mi">2015</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> + <span class="s1">'email'</span><span class="p">:</span> <span class="p">[</span><span class="s1">'[email protected]'</span><span class="p">],</span> + <span class="s1">'email_on_failure'</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span> + <span class="s1">'email_on_retry'</span><span class="p">:</span> <span class="kc">False</span><span class="p">,</span> + <span class="s1">'retries'</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> + <span class="s1">'retry_delay'</span><span class="p">:</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">minutes</span><span class="o">=</span><span class="mi">5</span><span class="p">),</span> + <span class="s1">'schedule_interval'</span><span class="p">:</span> <span class="s1">'@hourly'</span><span class="p">,</span> +<span class="p">}</span> + +<span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span><span class="s1">'tutorial'</span><span class="p">,</span> <span class="n">catchup</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">default_args</span><span class="o">=</span><span class="n">default_args</span><span class="p">)</span> +</pre></div> +</div> <p>In the example above, if the DAG is picked up by the scheduler daemon on 2016-01-02 at 6 AM, (or from the command line), a single DAG Run will be created, with an <code class="docutils literal"><span class="pre">execution_date</span></code> of 2016-01-01, and the next one will be created just after midnight on the morning of 2016-01-03 with an execution date of 2016-01-02.</p> <p>If the <code class="docutils literal"><span class="pre">dag.catchup</span></code> value had been True instead, the scheduler would have created a DAG Run for each completed interval between 2015-12-01 and 2016-01-02 (but not yet one for 2016-01-02, as that interval -hasn’t completed) and the scheduler will execute them sequentially. This behavior is great for atomic +hasnât completed) and the scheduler will execute them sequentially. This behavior is great for atomic datasets that can easily be split into periods. Turning catchup off is great if your DAG Runs perform backfill internally.</p> </div> @@ -282,7 +311,7 @@ backfill internally.</p> <p>Note that <code class="docutils literal"><span class="pre">DAG</span> <span class="pre">Runs</span></code> can also be created manually through the CLI while running an <code class="docutils literal"><span class="pre">airflow</span> <span class="pre">trigger_dag</span></code> command, where you can define a specific <code class="docutils literal"><span class="pre">run_id</span></code>. The <code class="docutils literal"><span class="pre">DAG</span> <span class="pre">Runs</span></code> created externally to the -scheduler get associated to the trigger’s timestamp, and will be displayed +scheduler get associated to the triggerâs timestamp, and will be displayed in the UI alongside scheduled <code class="docutils literal"><span class="pre">DAG</span> <span class="pre">runs</span></code>.</p> </div> <div class="section" id="to-keep-in-mind"> @@ -291,17 +320,28 @@ in the UI alongside scheduled <code class="docutils literal"><span class="pre">D <li>The first <code class="docutils literal"><span class="pre">DAG</span> <span class="pre">Run</span></code> is created based on the minimum <code class="docutils literal"><span class="pre">start_date</span></code> for the tasks in your DAG.</li> <li>Subsequent <code class="docutils literal"><span class="pre">DAG</span> <span class="pre">Runs</span></code> are created by the scheduler process, based on -your DAG’s <code class="docutils literal"><span class="pre">schedule_interval</span></code>, sequentially.</li> -<li>When clearing a set of tasks’ state in hope of getting them to re-run, -it is important to keep in mind the <code class="docutils literal"><span class="pre">DAG</span> <span class="pre">Run</span></code>‘s state too as it defines +your DAGâs <code class="docutils literal"><span class="pre">schedule_interval</span></code>, sequentially.</li> +<li>When clearing a set of tasksâ state in hope of getting them to re-run, +it is important to keep in mind the <code class="docutils literal"><span class="pre">DAG</span> <span class="pre">Run</span></code>âs state too as it defines whether the scheduler should look into triggering tasks for that run.</li> </ul> <p>Here are some of the ways you can <strong>unblock tasks</strong>:</p> <ul class="simple"> -<li>From the UI, you can <strong>clear</strong> (as in delete the status of) individual task instances from the task instances dialog, while defining whether you want to includes the past/future and the upstream/downstream dependencies. Note that a confirmation window comes next and allows you to see the set you are about to clear.</li> -<li>The CLI command <code class="docutils literal"><span class="pre">airflow</span> <span class="pre">clear</span> <span class="pre">-h</span></code> has lots of options when it comes to clearing task instance states, including specifying date ranges, targeting task_ids by specifying a regular expression, flags for including upstream and downstream relatives, and targeting task instances in specific states (<code class="docutils literal"><span class="pre">failed</span></code>, or <code class="docutils literal"><span class="pre">success</span></code>)</li> -<li>Marking task instances as successful can be done through the UI. This is mostly to fix false negatives, or for instance when the fix has been applied outside of Airflow.</li> -<li>The <code class="docutils literal"><span class="pre">airflow</span> <span class="pre">backfill</span></code> CLI subcommand has a flag to <code class="docutils literal"><span class="pre">--mark_success</span></code> and allows selecting subsections of the DAG as well as specifying date ranges.</li> +<li>From the UI, you can <strong>clear</strong> (as in delete the status of) individual task instances +from the task instances dialog, while defining whether you want to includes the past/future +and the upstream/downstream dependencies. Note that a confirmation window comes next and +allows you to see the set you are about to clear. You can also clear all task instances +associated with the dag.</li> +<li>The CLI command <code class="docutils literal"><span class="pre">airflow</span> <span class="pre">clear</span> <span class="pre">-h</span></code> has lots of options when it comes to clearing task instance +states, including specifying date ranges, targeting task_ids by specifying a regular expression, +flags for including upstream and downstream relatives, and targeting task instances in specific +states (<code class="docutils literal"><span class="pre">failed</span></code>, or <code class="docutils literal"><span class="pre">success</span></code>)</li> +<li>Clearing a task instance will no longer delete the task instance record. Instead it updates +max_tries and set the current task instance state to be None.</li> +<li>Marking task instances as successful can be done through the UI. This is mostly to fix false negatives, +or for instance when the fix has been applied outside of Airflow.</li> +<li>The <code class="docutils literal"><span class="pre">airflow</span> <span class="pre">backfill</span></code> CLI subcommand has a flag to <code class="docutils literal"><span class="pre">--mark_success</span></code> and allows selecting +subsections of the DAG as well as specifying date ranges.</li> </ul> </div> </div> http://git-wip-us.apache.org/repos/asf/incubator-airflow-site/blob/28a3eb60/search.html ---------------------------------------------------------------------- diff --git a/search.html b/search.html index eed50cd..5121594 100644 --- a/search.html +++ b/search.html @@ -13,6 +13,8 @@ + + @@ -79,7 +81,10 @@ - <ul> + + + + <ul> <li class="toctree-l1"><a class="reference internal" href="project.html">Project</a></li> <li class="toctree-l1"><a class="reference internal" href="license.html">License</a></li> <li class="toctree-l1"><a class="reference internal" href="start.html">Quick Start</a></li>
