This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/airflow-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 8df37fd Update asf-site to output generated at 9db07f5
8df37fd is described below
commit 8df37fd63b29d26956c0be04cab099b7dc6f5644
Author: potiuk <[email protected]>
AuthorDate: Mon Aug 17 19:07:48 2020 +0000
Update asf-site to output generated at 9db07f5
---
_gen/indexes/en/blog-index.json | 2 +-
_gen/indexes/en/blog-posts.json | 2 +-
blog/Simple_dag.png | Bin 0 -> 66340 bytes
blog/airflow-1.10.10/index.html | 4 +-
blog/airflow-1.10.8-1.10.9/index.html | 4 +-
blog/airflow-survey/index.html | 4 +-
blog/airflow-ui.png | Bin 0 -> 179332 bytes
blog/announcing-new-website/index.html | 35 ++-
.../index.html | 277 ++++++++++++++-------
.../index.html | 35 ++-
.../index.html | 4 +-
.../index.html | 4 +-
.../index.html | 48 +++-
blog/index.html | 31 +++
blog/index.xml | 166 ++++++++++++
.../index.html | 4 +-
blog/semicomplex.png | Bin 0 -> 109173 bytes
blog/tags/community/index.html | 39 ++-
blog/tags/community/index.xml | 172 ++++++++++++-
index.html | 26 +-
index.xml | 166 ++++++++++++
search/index.html | 4 +-
sitemap.xml | 123 ++++-----
tags/index.html | 2 +-
tags/index.xml | 2 +-
use-cases/adobe/index.html | 4 +-
use-cases/big-fish-games/index.html | 4 +-
use-cases/dish/index.html | 4 +-
use-cases/experity/index.html | 4 +-
use-cases/onefootball/index.html | 4 +-
30 files changed, 973 insertions(+), 201 deletions(-)
diff --git a/_gen/indexes/en/blog-index.json b/_gen/indexes/en/blog-index.json
index 3bed3c5..1e0fc67 100644
--- a/_gen/indexes/en/blog-index.json
+++ b/_gen/indexes/en/blog-index.json
@@ -1 +1 @@
-{"version":"2.3.8","fields":["title","description","author","content","tags","url"],"fieldVectors":[["title/It's
a \"Breeze\" to develop Apache
Airflow",[0,1.906,1,0.855,2,0.163,3,0.38,4,0.479]],["description/It's a
\"Breeze\" to develop Apache
Airflow",[2,0.177,4,0.342,5,1.358,6,0.609,7,1.358,8,0.78,9,0.609,10,0.609,11,0.609,12,0.472,13,1.358,14,0.472,15,1.358,16,0.1]],["author/It's
a \"Breeze\" to develop Apache Airflow",[11,0.855,17,1.095]],["content/It's a
\"Breeze\" to develop Apach [...]
\ No newline at end of file
+{"version":"2.3.8","fields":["title","description","author","content","tags","url"],"fieldVectors":[["title/It's
a \"Breeze\" to develop Apache
Airflow",[0,2.089,1,0.91,2,0.031,3,0.297,4,0.414]],["description/It's a
\"Breeze\" to develop Apache
Airflow",[2,0.032,4,0.277,5,1.402,6,0.733,7,1.402,8,0.886,9,0.733,10,0.733,11,0.733,12,0.611,13,1.402,14,0.611,15,1.402,16,0.217]],["author/It's
a \"Breeze\" to develop Apache Airflow",[11,1.113,17,1.345]],["content/It's a
\"Breeze\" to develop Ap [...]
\ No newline at end of file
diff --git a/_gen/indexes/en/blog-posts.json b/_gen/indexes/en/blog-posts.json
index 41d118a..6901ad5 100644
--- a/_gen/indexes/en/blog-posts.json
+++ b/_gen/indexes/en/blog-posts.json
@@ -1 +1 @@
-[{"content":"## The story behind the Airflow Breeze tool\nInitially, we
started contributing to this fantastic open-source project [Apache Airflow]
with a team of three which then grew to five. When we kicked it off a year ago,
I realized pretty soon where the biggest bottlenecks and areas for improvement
in terms of productivity were. Even with the help of our client, who provided
us with a “homegrown” development environment it took us literally days to set
it up and learn some basics. [...]
\ No newline at end of file
+[{"content":"## The story behind the Airflow Breeze tool\nInitially, we
started contributing to this fantastic open-source project [Apache Airflow]
with a team of three which then grew to five. When we kicked it off a year ago,
I realized pretty soon where the biggest bottlenecks and areas for improvement
in terms of productivity were. Even with the help of our client, who provided
us with a “homegrown” development environment it took us literally days to set
it up and learn some basics. [...]
\ No newline at end of file
diff --git a/blog/Simple_dag.png b/blog/Simple_dag.png
new file mode 100644
index 0000000..c72af63
Binary files /dev/null and b/blog/Simple_dag.png differ
diff --git a/blog/airflow-1.10.10/index.html b/blog/airflow-1.10.10/index.html
index db34361..6780359 100644
--- a/blog/airflow-1.10.10/index.html
+++ b/blog/airflow-1.10.10/index.html
@@ -36,13 +36,13 @@
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2020-04-09T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Apache Airflow 1.10.10">
<meta itemprop="description" content="We are happy to present Apache Airflow
1.10.10">
<meta itemprop="datePublished" content="2020-04-09T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="1143">
diff --git a/blog/airflow-1.10.8-1.10.9/index.html
b/blog/airflow-1.10.8-1.10.9/index.html
index cd4dc55..ec40ba8 100644
--- a/blog/airflow-1.10.8-1.10.9/index.html
+++ b/blog/airflow-1.10.8-1.10.9/index.html
@@ -36,13 +36,13 @@
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2020-02-23T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Apache Airflow 1.10.8 & 1.10.9">
<meta itemprop="description" content="We are happy to present the new 1.10.8
and 1.10.9 releases of Apache Airflow.">
<meta itemprop="datePublished" content="2020-02-23T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="437">
diff --git a/blog/airflow-survey/index.html b/blog/airflow-survey/index.html
index ca32f86..e186daa 100644
--- a/blog/airflow-survey/index.html
+++ b/blog/airflow-survey/index.html
@@ -36,13 +36,13 @@
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2019-12-11T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Airflow Survey 2019">
<meta itemprop="description" content="Receiving and adjusting to our users’
feedback is a must. Let’s see who Airflow users are, how they play with it, and
what they miss.">
<meta itemprop="datePublished" content="2019-12-11T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="1775">
diff --git a/blog/airflow-ui.png b/blog/airflow-ui.png
new file mode 100644
index 0000000..d6a8221
Binary files /dev/null and b/blog/airflow-ui.png differ
diff --git a/blog/announcing-new-website/index.html
b/blog/announcing-new-website/index.html
index e88971e..882828e 100644
--- a/blog/announcing-new-website/index.html
+++ b/blog/announcing-new-website/index.html
@@ -36,13 +36,13 @@
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2019-12-11T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="New Airflow website">
<meta itemprop="description" content="We are thrilled about our new website!">
<meta itemprop="datePublished" content="2019-12-11T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="282">
@@ -547,6 +547,37 @@ and that its improved accessibility increases adoption and
use of Apache Airflow
</div>
+ <span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Mon, Aug 17, 2020</span>
+ </div>
+ <p class="box-event__blogpost--header">Apache Airflow For Newcomers</p>
+ <p class="box-event__blogpost--author">Ephraim Anierobi</p>
+ <p class="box-event__blogpost--description"></p>
+ <div class="mt-auto">
+ <a href="/blog/apache-airflow-for-new-comers/">
+
+
+<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Read
more</button>
+
+ </a>
+ </div>
+ </div>
+</div>
+
+ </div>
+
+ <div class="list-item list-item--wide">
+
+
+<div class="card">
+ <div class="box-event__blogpost">
+ <div class="box-event__blogpost--metadata">
+ <div class="tags-container">
+
+
+ <a class="tag"
href="/blog/tags/community/">Community</a>
+
+
+ </div>
<span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Fri, Nov 22, 2019</span>
</div>
<p class="box-event__blogpost--header">ApacheCon Europe 2019 —
Thoughts and Insights by Airflow Committers</p>
diff --git a/blog/airflow-1.10.8-1.10.9/index.html
b/blog/apache-airflow-for-new-comers/index.html
similarity index 79%
copy from blog/airflow-1.10.8-1.10.9/index.html
copy to blog/apache-airflow-for-new-comers/index.html
index cd4dc55..ab2bc2d 100644
--- a/blog/airflow-1.10.8-1.10.9/index.html
+++ b/blog/apache-airflow-for-new-comers/index.html
@@ -29,30 +29,33 @@
<meta name="msapplication-TileImage" content="/favicons/ms-icon-144x144.png">
<meta name="theme-color" content="#ffffff">
-<title>Apache Airflow 1.10.8 & 1.10.9 | Apache Airflow</title><meta
property="og:title" content="Apache Airflow 1.10.8 & 1.10.9" />
-<meta property="og:description" content="We are happy to present the new
1.10.8 and 1.10.9 releases of Apache Airflow." />
+<title>Apache Airflow For Newcomers | Apache Airflow</title><meta
property="og:title" content="Apache Airflow For Newcomers" />
+<meta property="og:description" content="Apache Airflow is a platform to
programmatically author, schedule, and monitor workflows. A workflow is a
sequence of tasks that processes a set of data. You can think of workflow as
the path that describes how tasks go from being undone to done. Scheduling, on
the other hand, is the process of planning, controlling, and optimizing when a
particular task should be done.
+Authoring Workflow in Apache Airflow. Airflow makes it easy to author
workflows using python scripts." />
<meta property="og:type" content="article" />
-<meta property="og:url" content="/blog/airflow-1.10.8-1.10.9/" />
+<meta property="og:url" content="/blog/apache-airflow-for-new-comers/" />
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="article:published_time" content="2020-02-23T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
-<meta itemprop="name" content="Apache Airflow 1.10.8 & 1.10.9">
-<meta itemprop="description" content="We are happy to present the new 1.10.8
and 1.10.9 releases of Apache Airflow.">
+<meta property="article:published_time" content="2020-08-17T00:00:00+00:00" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta itemprop="name" content="Apache Airflow For Newcomers">
+<meta itemprop="description" content="Apache Airflow is a platform to
programmatically author, schedule, and monitor workflows. A workflow is a
sequence of tasks that processes a set of data. You can think of workflow as
the path that describes how tasks go from being undone to done. Scheduling, on
the other hand, is the process of planning, controlling, and optimizing when a
particular task should be done.
+Authoring Workflow in Apache Airflow. Airflow makes it easy to author
workflows using python scripts.">
-<meta itemprop="datePublished" content="2020-02-23T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
-<meta itemprop="wordCount" content="437">
+<meta itemprop="datePublished" content="2020-08-17T00:00:00+00:00" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
+<meta itemprop="wordCount" content="1107">
-<meta itemprop="keywords" content="release," />
+<meta itemprop="keywords" content="Community," />
<meta name="twitter:card" content="summary_large_image"/>
<meta name="twitter:image" content="/images/feature-image.png"/>
-<meta name="twitter:title" content="Apache Airflow 1.10.8 & 1.10.9"/>
-<meta name="twitter:description" content="We are happy to present the new
1.10.8 and 1.10.9 releases of Apache Airflow."/>
+<meta name="twitter:title" content="Apache Airflow For Newcomers"/>
+<meta name="twitter:description" content="Apache Airflow is a platform to
programmatically author, schedule, and monitor workflows. A workflow is a
sequence of tasks that processes a set of data. You can think of workflow as
the path that describes how tasks go from being undone to done. Scheduling, on
the other hand, is the process of planning, controlling, and optimizing when a
particular task should be done.
+Authoring Workflow in Apache Airflow. Airflow makes it easy to author
workflows using python scripts."/>
<script type="application/javascript">
@@ -80,7 +83,7 @@ if (!doNotTrack) {
crossorigin="anonymous"></script>
-<meta name="description" content="We are happy to present the new 1.10.8 and
1.10.9 releases of Apache Airflow." />
+<meta name="description" content="Platform created by the community to
programmatically author, schedule and monitor workflows." />
@@ -407,19 +410,19 @@ if (!doNotTrack) {
<div class="tags-container">
- <a class="tag" href="/blog/tags/release/">Release</a>
+ <a class="tag" href="/blog/tags/community/">Community</a>
</div>
- <span class="bodytext__medium--brownish-grey">Sun, Feb 23, 2020</span>
+ <span class="bodytext__medium--brownish-grey">Mon, Aug 17, 2020</span>
</div>
- <p class="blogpost-content__metadata--title">Apache Airflow 1.10.8 &
1.10.9</p>
+ <p class="blogpost-content__metadata--title">Apache Airflow For
Newcomers</p>
<div class="blogpost-content__metadata--author">
<span class="blogpost-content__metadata--author">
- Kaxil Naik
+ Ephraim Anierobi
</span>
- <a href="https://twitter.com/kaxil/"
class="blogpost-content__metadata--social-media-icon">
+ <a href="https://twitter.com/ephraimbuddy/"
class="blogpost-content__metadata--social-media-icon">
<svg xmlns="http://www.w3.org/2000/svg" width="22" height="21"
viewBox="0 0 22 21">
<g id="Group_1746" data-name="Group 1746" transform="translate(.076
-.055)">
<ellipse id="Ellipse_19" cx="11" cy="10.5" fill="#51504f"
data-name="Ellipse 19" rx="11" ry="10.5"
@@ -435,7 +438,7 @@ if (!doNotTrack) {
</a>
- <a href="https://github.com/kaxil/"
class="blogpost-content__metadata--social-media-icon">
+ <a href="https://github.com/ephraimbuddy/"
class="blogpost-content__metadata--social-media-icon">
<svg xmlns="http://www.w3.org/2000/svg" width="21.737"
height="21.2" viewBox="0 0 21.737 21.2">
<path id="Path_1378" d="M33.971 1181.31a10.87 10.87 0 0 0-3.435
21.182c.543.1.742-.236.742-.524
0-.258-.009-.941-.015-1.848-3.023.657-3.661-1.457-3.661-1.457a2.876 2.876 0 0
0-1.207-1.59c-.987-.674.075-.661.075-.661a2.283 2.283 0 0 1 1.665 1.12 2.314
2.314 0 0 0 3.163.9 2.322 2.322 0 0 1
.69-1.453c-2.413-.274-4.951-1.207-4.951-5.371a4.2 4.2 0 0 1 1.119-2.917 3.908
3.908 0 0 1 .107-2.876s.913-.292 2.989 1.114a10.3 10.3 0 0 1 5.442
0c2.075-1.406 2.986-1.114 2.986-1.114a3.9 3.9 0 0 1 .1 [...]
</svg>
@@ -443,92 +446,163 @@ if (!doNotTrack) {
</a>
- <a href="https://linkedin.com/in/kaxil/"
class="blogpost-content__metadata--social-media-icon">
- <svg xmlns="http://www.w3.org/2000/svg" width="20" height="20"
viewBox="0 0 20 21">
- <g id="Group_1745" data-name="Group 1745" transform="translate(.155
-.055)">
- <ellipse id="Ellipse_20" cx="10" cy="10.5" fill="#51504f"
data-name="Ellipse 20" rx="10" ry="10.5" transform="translate(-.155 .055)"/>
- <g id="Group_698" data-name="Group 698" transform="translate(5.843
5.004)">
- <path id="Path_644" d="M-1311.072 1423.962a.9.9 0 0
1-.972.9.888.888 0 0 1-.937-.9.9.9 0 0 1 .961-.9.892.892 0 0 1 .948.9zm-1.862
7.413v-5.8h1.8v5.8z" fill="#fff" data-name="Path 644"
transform="translate(1312.981 -1423.061)"/>
- <path id="Path_645" d="M-1284.253
1448.767c0-.723-.024-1.328-.047-1.85h1.565l.083.806h.035a2.084 2.084 0 0 1
1.791-.937c1.186 0 2.076.795 2.076
2.5v3.428h-1.8v-3.214c0-.747-.261-1.257-.914-1.257a.989.989 0 0 0-.925.676 1.29
1.29 0 0 0-.06.451v3.345h-1.8z" fill="#fff" data-name="Path 645"
transform="translate(1287.182 -1444.402)"/>
- </g>
- </g>
-</svg>
-
- </a>
-
</div>
- <p class="blogpost-content__metadata--description">We are happy to present
the new 1.10.8 and 1.10.9 releases of Apache Airflow.</p>
+ <p class="blogpost-content__metadata--description"></p>
</div>
<div class="markdown-content">
-<p>Airflow 1.10.8 contains 160 commits since 1.10.7 and includes 4 new
features, 42 improvements, 36 bug fixes, and several doc changes.</p>
+<p>Apache Airflow is a platform to programmatically author, schedule, and
monitor workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.</p>
-<p>We released 1.10.9 on the same day as one of the Flask dependencies
(Werkzeug) released 1.0 which broke Airflow 1.10.8.</p>
+<h3 id="authoring-workflow-in-apache-airflow">Authoring Workflow in Apache
Airflow.</h3>
-<p><strong>Details</strong>:</p>
+<p>Airflow makes it easy to author workflows using python scripts. A <a
href="https://en.wikipedia.org/wiki/Directed_acyclic_graph"
target="_blank">Directed Acyclic Graph</a>
+(DAG) represents a workflow in Airflow. It is a collection of tasks in a way
that shows each task’s
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them according to the task’s relationships and dependencies. If task B
depends on the successful
+execution of another task A, it means Airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as</p>
-<ul>
-<li><strong>PyPI</strong>: <a
href="https://pypi.org/project/apache-airflow/1.10.9/"
target="_blank">https://pypi.org/project/apache-airflow/1.10.9/</a></li>
-<li><strong>Docs</strong>: <a href="https://airflow.apache.org/docs/1.10.9/"
target="_blank">https://airflow.apache.org/docs/1.10.9/</a></li>
-<li><strong>Changelog (1.10.8)</strong>: <a
href="http://airflow.apache.org/docs/1.10.8/changelog.html#airflow-1-10-8-2020-01-07"
target="_blank">http://airflow.apache.org/docs/1.10.8/changelog.html#airflow-1-10-8-2020-01-07</a></li>
-<li><strong>Changelog (1.10.9)</strong>: <a
href="http://airflow.apache.org/docs/1.10.9/changelog.html#airflow-1-10-9-2020-02-10"
target="_blank">http://airflow.apache.org/docs/1.10.9/changelog.html#airflow-1-10-9-2020-02-10</a></li>
-</ul>
+<pre><code class="language-python">task_A >> task_B
+</code></pre>
+
+<p>Also equivalent to</p>
+
+<pre><code class="language-python">task_A.set_downstream(task_B)
+</code></pre>
+
+<p><img src="Simple_dag.png" alt="Simple Dag" /></p>
+
+<p>That helps Airflow to know that it needs to execute task A before task B.
Tasks can have far more complex
+relationships to each other than expressed above and Airflow figures out how
and when to execute the tasks following
+their relationships and dependencies.
+<img src="semicomplex.png" alt="Complex Dag" /></p>
+
+<p>Before we discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing, let us discuss the <a
href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">Breeze environment</a>.</p>
+
+<h3 id="breeze-environment">Breeze Environment</h3>
+
+<p>The breeze environment is the development environment for Airflow where you
can run tests, build images,
+build documentations and so many other things. There are excellent
+<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">documentation and video</a> on Breeze environment.
+Please check them out. You enter the Breeze environment by running the
<code>./breeze</code> script. You can run all
+the commands mentioned here in the Breeze environment.</p>
+
+<h3 id="scheduler">Scheduler</h3>
+
+<p>The scheduler is the component that monitors DAGs and triggers those tasks
whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes this by spawning a process that runs
periodically(every minute or so)
+reading the metadata database to check the status of each task and decides
what needs to be done.
+The metadata database is where the status of all tasks are recorded. The
status can be one of running,
+ success, failed, etc.</p>
-<p>Some of the noteworthy new features (user-facing) are:</p>
+<p>A task is said to be ready when its dependencies have been met. The
dependencies include all the data
+necessary for the task to be executed. It should be noted that the scheduler
won’t trigger your tasks until
+the period it covers has ended. If a task’s
<code>schedule_interval</code> is <code>@daily</code>, the scheduler triggers
the task
+at the end of the day and not at the beginning. This is to ensure that the
necessary data needed for the tasks
+are ready. It is also possible trigger tasks manually on the UI.</p>
+
+<p>In the <a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">Breeze environment</a>, the scheduler is started by running the
command <code>airflow scheduler</code>. It uses
+the configured production environment. The configuration can be specified in
<code>airflow.cfg</code></p>
+
+<h3 id="executor">Executor</h3>
+
+<p>Executors are responsible for running tasks. They work with the scheduler
to get information about
+what resources are needed to run a task as the task is queued.</p>
+
+<p>By default, Airflow uses the <a
href="https://airflow.apache.org/docs/stable/executor/sequential.html#sequential-executor"
target="_blank">SequentialExecutor</a>.
+ However, this executor is limited and it is the only executor that can be
used with SQLite.</p>
+
+<p>There are many other <a
href="https://airflow.apache.org/docs/stable/executor/index.html"
target="_blank">executors</a>,
+ the difference is on the resources they have and how they choose to use the
resources. The available executors
+ are:</p>
<ul>
-<li><a href="https://github.com/apache/airflow/pull/6489" target="_blank">Add
tags to DAGs and use it for filtering in the UI (RBAC only)</a></li>
-<li><a href="http://airflow.apache.org/docs/1.10.9/executor/debug.html"
target="_blank">New Executor: DebugExecutor for Local debugging from your
IDE</a></li>
-<li><a href="https://github.com/apache/airflow/pull/7281"
target="_blank">Allow passing conf in “Add DAG Run” (Triggered
Dags) view</a></li>
-<li><a href="https://github.com/apache/airflow/pull/7038"
target="_blank">Allow dags to run for future execution dates for manually
triggered DAGs (only if <code>schedule_interval=None</code>)</a></li>
-<li><a href="https://airflow.apache.org/docs/1.10.9/configurations-ref.html"
target="_blank">Dedicated page in documentation for all configs in
airflow.cfg</a></li>
+<li>Sequential Executor</li>
+<li>Debug Executor</li>
+<li>Local Executor</li>
+<li>Dask Executor</li>
+<li>Celery Executor</li>
+<li>Kubernetes Executor</li>
+<li>Scaling Out with Mesos (community contributed)</li>
</ul>
-<h3 id="add-tags-to-dags-and-use-it-for-filtering-in-the-ui">Add tags to DAGs
and use it for filtering in the UI</h3>
+<p>CeleryExecutor is a better executor compared to the SequentialExecutor. The
CeleryExecutor uses several
+workers to execute a job in a distributed way. If a worker node is ever down,
the CeleryExecutor assign its
+task to another worker node. This ensures high availability.</p>
-<p>In order to filter DAGs (e.g by team), you can add tags in each dag. The
filter is saved in a cookie and can be reset by the reset button.</p>
+<p>The CeleryExecutor works closely with the scheduler which adds a message to
the queue and the Celery broker
+which delivers the message to a Celery worker to execute.
+You can find more information about the CeleryExecutor and how to configure it
at the
+<a
href="https://airflow.apache.org/docs/stable/executor/celery.html#celery-executor"
target="_blank">documentation</a></p>
-<p>For example:</p>
+<h3 id="webserver">Webserver</h3>
-<p>In your Dag file, pass a list of tags you want to add to DAG object:</p>
+<p>The webserver is the web interface (UI) for Airflow. The UI is
feature-rich. It makes it easy to
+monitor and troubleshoot DAGs and Tasks.</p>
-<pre><code class="language-python">dag = DAG(
- dag_id='example_dag_tag',
- schedule_interval='0 0 * * *',
- tags=['example']
-)
-</code></pre>
+<p><img src="airflow-ui.png" alt="airflow UI" /></p>
-<p><strong>Screenshot</strong>:
-<img src="airflow-dag-tags.png" alt="Add filter by DAG tags" /></p>
+<p>There are many actions you can perform on the UI. You can trigger a task,
monitor the execution
+including the duration of the task. The UI makes it possible to view the
task’s dependencies in a
+tree view and graph view. You can view task logs in the UI.</p>
-<p><strong>Note</strong>: This feature is only available for the RBAC UI
(enabled using <code>rbac=True</code> in <code>[webserver]</code> section in
your <code>airflow.cfg</code>).</p>
+<p>The web UI is started with the command <code>airflow webserver</code> in
the breeze environment.</p>
-<h2 id="special-note-deprecations">Special Note / Deprecations</h2>
+<h3 id="backend">Backend</h3>
-<h3 id="python-2">Python 2</h3>
+<p>By default, Airflow uses the SQLite backend for storing the configuration
information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.</p>
-<p>Python 2 has reached end of its life on Jan 2020. Airflow Master no longer
supports Python 2.
-Airflow 1.10.* would be the last series to support Python 2.</p>
+<p>You can use PostgreSQL or MySQL as a backend for airflow. It is easy to
change to PostgreSQL or MySQL.</p>
-<p>We strongly recommend users to use Python >= 3.6</p>
+<p>The command <code>./breeze --backend mysql</code> selects MySQL as the
backend when starting the breeze environment.</p>
-<h3 id="use-airflow-rbac-ui">Use Airflow RBAC UI</h3>
+<h3 id="operators">Operators</h3>
-<p>Airflow 1.10.9 ships with 2 UIs, the default is non-RBAC Flask-admin based
UI and Flask-appbuilder based UI.</p>
+<p>Operators determine what gets done by a task. Airflow has a lot of builtin
Operators. Each operator
+does a specific task. There’s a BashOperator that executes a bash
command, the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and <a href="https://airflow.apache.org/docs/stable/concepts.html#operators"
target="_blank">many more</a>.</p>
-<p>The Flask-AppBuilder (FAB) based UI is allows Role-based Access Control and
has more advanced features compared to
-the legacy Flask-admin based UI. This UI can be enabled by setting
<code>rbac=True</code> in <code>[webserver]</code> section in your
<code>airflow.cfg</code>.</p>
+<h4 id="sensors">Sensors</h4>
-<p>Flask-admin based UI is deprecated and new features won’t be ported
to it. This UI will still be the default
-for 1.10.* series but would no longer be available from Airflow 2.0</p>
+<p>Sensors can be described as special operators that are used to monitor a
long-running task.
+Just like Operators, there are many predefined sensors in Airflow. These
includes</p>
-<h2 id="list-of-contributors">List of Contributors</h2>
+<ul>
+<li>AthenaSensor: Asks for the state of the Query until it reaches a failure
state or success state.</li>
+<li>AzureCosmosDocumentSensor: Checks for the existence of a document which
matches the given query in CosmosDB</li>
+<li>GoogleCloudStorageObjectSensor: Checks for the existence of a file in
Google Cloud Storage</li>
+</ul>
-<p>According to git shortlog, the following people contributed to the 1.10.8
and 1.10.9 release. Thank you to all contributors!</p>
+<p>A list of most of the available sensors can be found in this <a
href="https://airflow.apache.org/docs/stable/_api/airflow/contrib/sensors/index.html?highlight=sensors#module-airflow.contrib.sensors"
target="_blank">module</a></p>
-<p>Anita Fronczak, Ash Berlin-Taylor, BasPH, Bharat Kashyap, Bharath Palaksha,
Bhavika Tekwani, Bjorn Olsen, Brian Phillips, Cooper Gillan, Daniel Cohen,
Daniel Imberman, Daniel Standish, Gabriel Eckers, Hossein Torabi, Igor Khrol,
Jacob, Jarek Potiuk, Jay, Jiajie Zhong, Jithin Sukumar, Kamil Breguła, Kaxil
Naik, Kousuke Saruta, Mustafa Gök, Noël Bardelot, Oluwafemi Sule, Pete DeJoy,
QP Hou, Qian Yu, Robin Edwards, Ry Walker, Steven van Rossum, Tomek Urbaszek,
Xinbin Huang, Yuen-Kuei Hsu [...]
+<h3 id="breeze-environment-1">Breeze Environment</h3>
+
+<p>The breeze environment is the development environment for Airflow where you
can run tests, build images,
+build documentations and so many other things. There are excellent
+<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">documentation and video</a> on Breeze environment.
+Please check them out.</p>
+
+<h3 id="contributing-to-airflow">Contributing to Airflow</h3>
+
+<p>Airflow is an open source project, everyone is welcome to contribute. It is
easy to get started thanks
+to the excellent <a
href="https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst"
target="_blank">documentation on how to get started</a>.</p>
+
+<p>I joined the community about 12 weeks ago through the <a
href="https://www.outreachy.org/" target="_blank">Outreachy Program</a> and have
+completed about <a href="https://github.com/apache/airflow/pulls/ephraimbuddy"
target="_blank">40 PRs</a>.</p>
+
+<p>It has been an amazing experience! Thanks to my mentors <a
href="https://github.com/potiuk" target="_blank">Jarek</a> and
+<a href="https://github.com/kaxil" target="_blank">Kaxil</a>, and the
community members especially <a href="https://github.com/mik-laj"
target="_blank">Kamil</a>
+and <a href="https://github.com/turbaszek" target="_blank">Tomek</a> for all
their support. I’m grateful!</p>
+
+<p>Thank you so much, <a href="https://github.com/leahecole"
target="_blank">Leah E. Cole</a>, for your wonderful reviews.</p>
</div>
@@ -586,16 +660,16 @@ for 1.10.* series but would no longer be available from
Airflow 2.0</p>
<div class="pager">
- <a
href="/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/">
+ <a href="/blog/implementing-stable-api-for-apache-airflow/">
<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue"
>Previous</button>
</a>
- <a href="/blog/airflow-1.10.10/">
+ <a >
-<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue"
>Next</button>
+<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue"
disabled>Next</button>
</a>
</div>
@@ -622,17 +696,48 @@ for 1.10.* series but would no longer be available from
Airflow 2.0</p>
<div class="tags-container">
- <a class="tag" href="/blog/tags/release/">Release</a>
+ <a class="tag"
href="/blog/tags/community/">Community</a>
+
+
+ </div>
+ <span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Wed, Dec 11, 2019</span>
+ </div>
+ <p class="box-event__blogpost--header">New Airflow website</p>
+ <p class="box-event__blogpost--author">Aizhamal Nurmamat kyzy</p>
+ <p class="box-event__blogpost--description">We are thrilled about our
new website!</p>
+ <div class="mt-auto">
+ <a href="/blog/announcing-new-website/">
+
+
+<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Read
more</button>
+
+ </a>
+ </div>
+ </div>
+</div>
+
+ </div>
+
+ <div class="list-item list-item--wide">
+
+
+<div class="card">
+ <div class="box-event__blogpost">
+ <div class="box-event__blogpost--metadata">
+ <div class="tags-container">
+
+
+ <a class="tag"
href="/blog/tags/community/">Community</a>
</div>
- <span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Thu, Apr 9, 2020</span>
+ <span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Fri, Nov 22, 2019</span>
</div>
- <p class="box-event__blogpost--header">Apache Airflow 1.10.10</p>
- <p class="box-event__blogpost--author">Kaxil Naik</p>
- <p class="box-event__blogpost--description">We are happy to present
Apache Airflow 1.10.10</p>
+ <p class="box-event__blogpost--header">ApacheCon Europe 2019 —
Thoughts and Insights by Airflow Committers</p>
+ <p class="box-event__blogpost--author">Polidea</p>
+ <p class="box-event__blogpost--description">Here come some thoughts by
Airflow committers and contributors from the ApacheCon Europe 2019. Get to know
the ASF community!</p>
<div class="mt-auto">
- <a href="/blog/airflow-1.10.10/">
+ <a
href="/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/">
<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Read
more</button>
@@ -656,7 +761,7 @@ for 1.10.* series but would no longer be available from
Airflow 2.0</p>
<div class="base-layout--button">
- <a
href=https://github.com/apache/airflow-site/edit/master/landing-pages/site/content/en/blog/airflow-1.10.8-1.10.9/index.md>
+ <a
href=https://github.com/apache/airflow-site/edit/master/landing-pages/site/content/en/blog/apache-airflow-for-new-comers.md>
diff --git
a/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html
b/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html
index 4f6bc59..e6e497c 100644
---
a/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html
+++
b/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/index.html
@@ -36,13 +36,13 @@
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2019-11-22T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="ApacheCon Europe 2019 — Thoughts and Insights
by Airflow Committers">
<meta itemprop="description" content="Here come some thoughts by Airflow
committers and contributors from the ApacheCon Europe 2019. Get to know the ASF
community!">
<meta itemprop="datePublished" content="2019-11-22T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="138">
@@ -527,6 +527,37 @@ if (!doNotTrack) {
</div>
+ <span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Mon, Aug 17, 2020</span>
+ </div>
+ <p class="box-event__blogpost--header">Apache Airflow For Newcomers</p>
+ <p class="box-event__blogpost--author">Ephraim Anierobi</p>
+ <p class="box-event__blogpost--description"></p>
+ <div class="mt-auto">
+ <a href="/blog/apache-airflow-for-new-comers/">
+
+
+<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Read
more</button>
+
+ </a>
+ </div>
+ </div>
+</div>
+
+ </div>
+
+ <div class="list-item list-item--wide">
+
+
+<div class="card">
+ <div class="box-event__blogpost">
+ <div class="box-event__blogpost--metadata">
+ <div class="tags-container">
+
+
+ <a class="tag"
href="/blog/tags/community/">Community</a>
+
+
+ </div>
<span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Wed, Dec 11, 2019</span>
</div>
<p class="box-event__blogpost--header">New Airflow website</p>
diff --git a/blog/documenting-using-local-development-environments/index.html
b/blog/documenting-using-local-development-environments/index.html
index 4acfdb7..aa2edeb 100644
--- a/blog/documenting-using-local-development-environments/index.html
+++ b/blog/documenting-using-local-development-environments/index.html
@@ -36,13 +36,13 @@
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2019-11-22T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Documenting using local development
environment">
<meta itemprop="description" content="The story behind documenting local
development environment of Apache Airflow">
<meta itemprop="datePublished" content="2019-11-22T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="256">
diff --git
a/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/index.html
b/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/index.html
index 11ff5ce..7b0b568 100644
---
a/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/index.html
+++
b/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/index.html
@@ -37,14 +37,14 @@ About Me I have been writing tech articles on medium as
well as my blog for the
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2019-12-20T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Experience in Google Season of Docs 2019 with
Apache Airflow">
<meta itemprop="description" content="I came across Google Season of Docs
(GSoD) almost by accident, thanks to my extensive HackerNews and Twitter
addiction. I was familiar with the Google Summer of Code but not with this
program. It turns out it was the inaugural phase. I read the details, and the
process felt a lot like GSoC except that this was about documentation.
About Me I have been writing tech articles on medium as well as my blog for
the past 1.">
<meta itemprop="datePublished" content="2019-12-20T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="1521">
diff --git a/blog/implementing-stable-api-for-apache-airflow/index.html
b/blog/implementing-stable-api-for-apache-airflow/index.html
index 7757cc3..671afb5 100644
--- a/blog/implementing-stable-api-for-apache-airflow/index.html
+++ b/blog/implementing-stable-api-for-apache-airflow/index.html
@@ -36,13 +36,13 @@
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2020-07-19T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Implementing Stable API for Apache Airflow">
<meta itemprop="description" content="An Outreachy intern's progress
report on contributing to Apache Airflow REST API.">
<meta itemprop="datePublished" content="2020-07-19T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="703">
@@ -596,10 +596,10 @@ wonderful reviews. I’m grateful.</p>
<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue"
>Previous</button>
</a>
- <a >
+ <a href="/blog/apache-airflow-for-new-comers/">
-<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue"
disabled>Next</button>
+<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue"
>Next</button>
</a>
</div>
@@ -612,6 +612,46 @@ wonderful reviews. I’m grateful.</p>
+ <div class="blog-pager">
+ <h5 class="header__xsmall--greyish-brown">Read also</h5>
+ <div class="pager">
+ <div class="list-items">
+
+ <div class="list-item list-item--wide">
+
+
+<div class="card">
+ <div class="box-event__blogpost">
+ <div class="box-event__blogpost--metadata">
+ <div class="tags-container">
+
+
+ <a class="tag"
href="/blog/tags/community/">Community</a>
+
+
+ </div>
+ <span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Mon, Aug 17, 2020</span>
+ </div>
+ <p class="box-event__blogpost--header">Apache Airflow For Newcomers</p>
+ <p class="box-event__blogpost--author">Ephraim Anierobi</p>
+ <p class="box-event__blogpost--description"></p>
+ <div class="mt-auto">
+ <a href="/blog/apache-airflow-for-new-comers/">
+
+
+<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Read
more</button>
+
+ </a>
+ </div>
+ </div>
+</div>
+
+ </div>
+
+ </div>
+ </div>
+ </div>
+
</div>
diff --git a/blog/index.html b/blog/index.html
index 33442fa..b5951f1 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -456,6 +456,37 @@ if (!doNotTrack) {
<div class="tags-container">
+ <a class="tag"
href="/blog/tags/community/">Community</a>
+
+
+ </div>
+ <span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Mon, Aug 17, 2020</span>
+ </div>
+ <p class="box-event__blogpost--header">Apache Airflow For Newcomers</p>
+ <p class="box-event__blogpost--author">Ephraim Anierobi</p>
+ <p class="box-event__blogpost--description"></p>
+ <div class="mt-auto">
+ <a href="/blog/apache-airflow-for-new-comers/">
+
+
+<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Read
more</button>
+
+ </a>
+ </div>
+ </div>
+</div>
+
+ </div>
+
+ <div class="list-item list-item--wide">
+
+
+<div class="card">
+ <div class="box-event__blogpost">
+ <div class="box-event__blogpost--metadata">
+ <div class="tags-container">
+
+
</div>
<span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Sun, Jul 19, 2020</span>
diff --git a/blog/index.xml b/blog/index.xml
index 34046f9..b89c645 100644
--- a/blog/index.xml
+++ b/blog/index.xml
@@ -14,6 +14,172 @@
<item>
+ <title>Blog: Apache Airflow For Newcomers</title>
+ <link>/blog/apache-airflow-for-new-comers/</link>
+ <pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate>
+
+ <guid>/blog/apache-airflow-for-new-comers/</guid>
+ <description>
+
+
+
+
+<p>Apache Airflow is a platform to programmatically author, schedule,
and monitor workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.</p>
+
+<h3 id="authoring-workflow-in-apache-airflow">Authoring Workflow
in Apache Airflow.</h3>
+
+<p>Airflow makes it easy to author workflows using python scripts. A
<a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph"
target="_blank">Directed Acyclic Graph</a>
+(DAG) represents a workflow in Airflow. It is a collection of tasks in a way
that shows each task&rsquo;s
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them according to the task&rsquo;s relationships and dependencies. If task
B depends on the successful
+execution of another task A, it means Airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as</p>
+
+<pre><code class="language-python">task_A &gt;&gt;
task_B
+</code></pre>
+
+<p>Also equivalent to</p>
+
+<pre><code
class="language-python">task_A.set_downstream(task_B)
+</code></pre>
+
+<p><img src="Simple_dag.png" alt="Simple Dag"
/></p>
+
+<p>That helps Airflow to know that it needs to execute task A before
task B. Tasks can have far more complex
+relationships to each other than expressed above and Airflow figures out how
and when to execute the tasks following
+their relationships and dependencies.
+<img src="semicomplex.png" alt="Complex Dag" /></p>
+
+<p>Before we discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing, let us discuss the <a
href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">Breeze environment</a>.</p>
+
+<h3 id="breeze-environment">Breeze Environment</h3>
+
+<p>The breeze environment is the development environment for Airflow
where you can run tests, build images,
+build documentations and so many other things. There are excellent
+<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">documentation and video</a> on Breeze
environment.
+Please check them out. You enter the Breeze environment by running the
<code>./breeze</code> script. You can run all
+the commands mentioned here in the Breeze environment.</p>
+
+<h3 id="scheduler">Scheduler</h3>
+
+<p>The scheduler is the component that monitors DAGs and triggers those
tasks whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes this by spawning a process that runs
periodically(every minute or so)
+reading the metadata database to check the status of each task and decides
what needs to be done.
+The metadata database is where the status of all tasks are recorded. The
status can be one of running,
+ success, failed, etc.</p>
+
+<p>A task is said to be ready when its dependencies have been met. The
dependencies include all the data
+necessary for the task to be executed. It should be noted that the scheduler
won&rsquo;t trigger your tasks until
+the period it covers has ended. If a task&rsquo;s
<code>schedule_interval</code> is <code>@daily</code>,
the scheduler triggers the task
+at the end of the day and not at the beginning. This is to ensure that the
necessary data needed for the tasks
+are ready. It is also possible trigger tasks manually on the UI.</p>
+
+<p>In the <a
href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">Breeze environment</a>, the scheduler is
started by running the command <code>airflow scheduler</code>. It
uses
+the configured production environment. The configuration can be specified in
<code>airflow.cfg</code></p>
+
+<h3 id="executor">Executor</h3>
+
+<p>Executors are responsible for running tasks. They work with the
scheduler to get information about
+what resources are needed to run a task as the task is queued.</p>
+
+<p>By default, Airflow uses the <a
href="https://airflow.apache.org/docs/stable/executor/sequential.html#sequential-executor"
target="_blank">SequentialExecutor</a>.
+ However, this executor is limited and it is the only executor that can be
used with SQLite.</p>
+
+<p>There are many other <a
href="https://airflow.apache.org/docs/stable/executor/index.html"
target="_blank">executors</a>,
+ the difference is on the resources they have and how they choose to use the
resources. The available executors
+ are:</p>
+
+<ul>
+<li>Sequential Executor</li>
+<li>Debug Executor</li>
+<li>Local Executor</li>
+<li>Dask Executor</li>
+<li>Celery Executor</li>
+<li>Kubernetes Executor</li>
+<li>Scaling Out with Mesos (community contributed)</li>
+</ul>
+
+<p>CeleryExecutor is a better executor compared to the
SequentialExecutor. The CeleryExecutor uses several
+workers to execute a job in a distributed way. If a worker node is ever down,
the CeleryExecutor assign its
+task to another worker node. This ensures high availability.</p>
+
+<p>The CeleryExecutor works closely with the scheduler which adds a
message to the queue and the Celery broker
+which delivers the message to a Celery worker to execute.
+You can find more information about the CeleryExecutor and how to configure it
at the
+<a
href="https://airflow.apache.org/docs/stable/executor/celery.html#celery-executor"
target="_blank">documentation</a></p>
+
+<h3 id="webserver">Webserver</h3>
+
+<p>The webserver is the web interface (UI) for Airflow. The UI is
feature-rich. It makes it easy to
+monitor and troubleshoot DAGs and Tasks.</p>
+
+<p><img src="airflow-ui.png" alt="airflow UI"
/></p>
+
+<p>There are many actions you can perform on the UI. You can trigger a
task, monitor the execution
+including the duration of the task. The UI makes it possible to view the
task&rsquo;s dependencies in a
+tree view and graph view. You can view task logs in the UI.</p>
+
+<p>The web UI is started with the command <code>airflow
webserver</code> in the breeze environment.</p>
+
+<h3 id="backend">Backend</h3>
+
+<p>By default, Airflow uses the SQLite backend for storing the
configuration information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.</p>
+
+<p>You can use PostgreSQL or MySQL as a backend for airflow. It is easy
to change to PostgreSQL or MySQL.</p>
+
+<p>The command <code>./breeze --backend mysql</code> selects
MySQL as the backend when starting the breeze environment.</p>
+
+<h3 id="operators">Operators</h3>
+
+<p>Operators determine what gets done by a task. Airflow has a lot of
builtin Operators. Each operator
+does a specific task. There&rsquo;s a BashOperator that executes a bash
command, the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and <a
href="https://airflow.apache.org/docs/stable/concepts.html#operators"
target="_blank">many more</a>.</p>
+
+<h4 id="sensors">Sensors</h4>
+
+<p>Sensors can be described as special operators that are used to
monitor a long-running task.
+Just like Operators, there are many predefined sensors in Airflow. These
includes</p>
+
+<ul>
+<li>AthenaSensor: Asks for the state of the Query until it reaches a
failure state or success state.</li>
+<li>AzureCosmosDocumentSensor: Checks for the existence of a document
which matches the given query in CosmosDB</li>
+<li>GoogleCloudStorageObjectSensor: Checks for the existence of a file
in Google Cloud Storage</li>
+</ul>
+
+<p>A list of most of the available sensors can be found in this <a
href="https://airflow.apache.org/docs/stable/_api/airflow/contrib/sensors/index.html?highlight=sensors#module-airflow.contrib.sensors"
target="_blank">module</a></p>
+
+<h3 id="breeze-environment-1">Breeze Environment</h3>
+
+<p>The breeze environment is the development environment for Airflow
where you can run tests, build images,
+build documentations and so many other things. There are excellent
+<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">documentation and video</a> on Breeze
environment.
+Please check them out.</p>
+
+<h3 id="contributing-to-airflow">Contributing to
Airflow</h3>
+
+<p>Airflow is an open source project, everyone is welcome to contribute.
It is easy to get started thanks
+to the excellent <a
href="https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst"
target="_blank">documentation on how to get
started</a>.</p>
+
+<p>I joined the community about 12 weeks ago through the <a
href="https://www.outreachy.org/" target="_blank">Outreachy
Program</a> and have
+completed about <a
href="https://github.com/apache/airflow/pulls/ephraimbuddy"
target="_blank">40 PRs</a>.</p>
+
+<p>It has been an amazing experience! Thanks to my mentors <a
href="https://github.com/potiuk"
target="_blank">Jarek</a> and
+<a href="https://github.com/kaxil"
target="_blank">Kaxil</a>, and the community members
especially <a href="https://github.com/mik-laj"
target="_blank">Kamil</a>
+and <a href="https://github.com/turbaszek"
target="_blank">Tomek</a> for all their support. I&rsquo;m
grateful!</p>
+
+<p>Thank you so much, <a href="https://github.com/leahecole"
target="_blank">Leah E. Cole</a>, for your wonderful
reviews.</p>
+
+ </description>
+ </item>
+
+ <item>
<title>Blog: Implementing Stable API for Apache Airflow</title>
<link>/blog/implementing-stable-api-for-apache-airflow/</link>
<pubDate>Sun, 19 Jul 2020 00:00:00 +0000</pubDate>
diff --git a/blog/its-a-breeze-to-develop-apache-airflow/index.html
b/blog/its-a-breeze-to-develop-apache-airflow/index.html
index 72c7bd2..a8c7137 100644
--- a/blog/its-a-breeze-to-develop-apache-airflow/index.html
+++ b/blog/its-a-breeze-to-develop-apache-airflow/index.html
@@ -36,13 +36,13 @@
<meta property="og:image" content="/images/feature-image.png" />
<meta property="article:published_time" content="2019-11-22T00:00:00+00:00" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="It's a "Breeze" to develop Apache
Airflow">
<meta itemprop="description" content="A Principal Software Engineer's
journey to developer productivity. Learn how Jarek and his team speeded up and
simplified Airflow development for the community.">
<meta itemprop="datePublished" content="2019-11-22T00:00:00+00:00" />
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="127">
diff --git a/blog/semicomplex.png b/blog/semicomplex.png
new file mode 100644
index 0000000..847cc61
Binary files /dev/null and b/blog/semicomplex.png differ
diff --git a/blog/tags/community/index.html b/blog/tags/community/index.html
index 6bf1164..1d3f389 100644
--- a/blog/tags/community/index.html
+++ b/blog/tags/community/index.html
@@ -30,21 +30,21 @@
<meta name="msapplication-TileImage" content="/favicons/ms-icon-144x144.png">
<meta name="theme-color" content="#ffffff">
-<title>community | Apache Airflow</title><meta property="og:title"
content="community" />
+<title>Community | Apache Airflow</title><meta property="og:title"
content="Community" />
<meta property="og:description" content="Platform created by the community to
programmatically author, schedule and monitor workflows." />
<meta property="og:type" content="website" />
<meta property="og:url" content="/blog/tags/community/" />
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="og:updated_time" content="2019-12-11T00:00:00+00:00" /><meta
property="og:site_name" content="Apache Airflow" />
-<meta itemprop="name" content="community">
+<meta property="og:updated_time" content="2020-08-17T00:00:00+00:00" /><meta
property="og:site_name" content="Apache Airflow" />
+<meta itemprop="name" content="Community">
<meta itemprop="description" content="Platform created by the community to
programmatically author, schedule and monitor workflows.">
<meta name="twitter:card" content="summary_large_image"/>
<meta name="twitter:image" content="/images/feature-image.png"/>
-<meta name="twitter:title" content="community"/>
+<meta name="twitter:title" content="Community"/>
<meta name="twitter:description" content="Platform created by the community to
programmatically author, schedule and monitor workflows."/>
@@ -468,6 +468,37 @@ if (!doNotTrack) {
<a class="tag"
href="/blog/tags/community/">Community</a>
+ </div>
+ <span class="bodytext__medium--brownish-grey
box-event__blogpost--date">Mon, Aug 17, 2020</span>
+ </div>
+ <p class="box-event__blogpost--header">Apache Airflow For Newcomers</p>
+ <p class="box-event__blogpost--author">Ephraim Anierobi</p>
+ <p class="box-event__blogpost--description"></p>
+ <div class="mt-auto">
+ <a href="/blog/apache-airflow-for-new-comers/">
+
+
+<button class="btn-hollow btn-blue bodytext__medium--cerulean-blue" >Read
more</button>
+
+ </a>
+ </div>
+ </div>
+</div>
+
+ </div>
+
+ <div class="list-item list-item--wide">
+
+
+<div class="card">
+ <div class="box-event__blogpost">
+ <div class="box-event__blogpost--metadata">
+ <div class="tags-container">
+
+
+ <a class="tag"
href="/blog/tags/community/">Community</a>
+
+
<a class="tag" href="/blog/tags/survey/">Survey</a>
diff --git a/blog/tags/community/index.xml b/blog/tags/community/index.xml
index 39dc48f..bc8a94d 100644
--- a/blog/tags/community/index.xml
+++ b/blog/tags/community/index.xml
@@ -1,10 +1,10 @@
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
- <title>Apache Airflow – community</title>
+ <title>Apache Airflow – Community</title>
<link>/blog/tags/community/</link>
- <description>Recent content in community on Apache Airflow</description>
+ <description>Recent content in Community on Apache Airflow</description>
<generator>Hugo -- gohugo.io</generator>
- <lastBuildDate>Wed, 11 Dec 2019 00:00:00 +0000</lastBuildDate>
+ <lastBuildDate>Mon, 17 Aug 2020 00:00:00 +0000</lastBuildDate>
<atom:link href="/blog/tags/community/index.xml" rel="self"
type="application/rss+xml" />
@@ -15,6 +15,172 @@
<item>
+ <title>Blog: Apache Airflow For Newcomers</title>
+ <link>/blog/apache-airflow-for-new-comers/</link>
+ <pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate>
+
+ <guid>/blog/apache-airflow-for-new-comers/</guid>
+ <description>
+
+
+
+
+<p>Apache Airflow is a platform to programmatically author, schedule,
and monitor workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.</p>
+
+<h3 id="authoring-workflow-in-apache-airflow">Authoring Workflow
in Apache Airflow.</h3>
+
+<p>Airflow makes it easy to author workflows using python scripts. A
<a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph"
target="_blank">Directed Acyclic Graph</a>
+(DAG) represents a workflow in Airflow. It is a collection of tasks in a way
that shows each task&rsquo;s
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them according to the task&rsquo;s relationships and dependencies. If task
B depends on the successful
+execution of another task A, it means Airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as</p>
+
+<pre><code class="language-python">task_A &gt;&gt;
task_B
+</code></pre>
+
+<p>Also equivalent to</p>
+
+<pre><code
class="language-python">task_A.set_downstream(task_B)
+</code></pre>
+
+<p><img src="Simple_dag.png" alt="Simple Dag"
/></p>
+
+<p>That helps Airflow to know that it needs to execute task A before
task B. Tasks can have far more complex
+relationships to each other than expressed above and Airflow figures out how
and when to execute the tasks following
+their relationships and dependencies.
+<img src="semicomplex.png" alt="Complex Dag" /></p>
+
+<p>Before we discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing, let us discuss the <a
href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">Breeze environment</a>.</p>
+
+<h3 id="breeze-environment">Breeze Environment</h3>
+
+<p>The breeze environment is the development environment for Airflow
where you can run tests, build images,
+build documentations and so many other things. There are excellent
+<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">documentation and video</a> on Breeze
environment.
+Please check them out. You enter the Breeze environment by running the
<code>./breeze</code> script. You can run all
+the commands mentioned here in the Breeze environment.</p>
+
+<h3 id="scheduler">Scheduler</h3>
+
+<p>The scheduler is the component that monitors DAGs and triggers those
tasks whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes this by spawning a process that runs
periodically(every minute or so)
+reading the metadata database to check the status of each task and decides
what needs to be done.
+The metadata database is where the status of all tasks are recorded. The
status can be one of running,
+ success, failed, etc.</p>
+
+<p>A task is said to be ready when its dependencies have been met. The
dependencies include all the data
+necessary for the task to be executed. It should be noted that the scheduler
won&rsquo;t trigger your tasks until
+the period it covers has ended. If a task&rsquo;s
<code>schedule_interval</code> is <code>@daily</code>,
the scheduler triggers the task
+at the end of the day and not at the beginning. This is to ensure that the
necessary data needed for the tasks
+are ready. It is also possible trigger tasks manually on the UI.</p>
+
+<p>In the <a
href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">Breeze environment</a>, the scheduler is
started by running the command <code>airflow scheduler</code>. It
uses
+the configured production environment. The configuration can be specified in
<code>airflow.cfg</code></p>
+
+<h3 id="executor">Executor</h3>
+
+<p>Executors are responsible for running tasks. They work with the
scheduler to get information about
+what resources are needed to run a task as the task is queued.</p>
+
+<p>By default, Airflow uses the <a
href="https://airflow.apache.org/docs/stable/executor/sequential.html#sequential-executor"
target="_blank">SequentialExecutor</a>.
+ However, this executor is limited and it is the only executor that can be
used with SQLite.</p>
+
+<p>There are many other <a
href="https://airflow.apache.org/docs/stable/executor/index.html"
target="_blank">executors</a>,
+ the difference is on the resources they have and how they choose to use the
resources. The available executors
+ are:</p>
+
+<ul>
+<li>Sequential Executor</li>
+<li>Debug Executor</li>
+<li>Local Executor</li>
+<li>Dask Executor</li>
+<li>Celery Executor</li>
+<li>Kubernetes Executor</li>
+<li>Scaling Out with Mesos (community contributed)</li>
+</ul>
+
+<p>CeleryExecutor is a better executor compared to the
SequentialExecutor. The CeleryExecutor uses several
+workers to execute a job in a distributed way. If a worker node is ever down,
the CeleryExecutor assign its
+task to another worker node. This ensures high availability.</p>
+
+<p>The CeleryExecutor works closely with the scheduler which adds a
message to the queue and the Celery broker
+which delivers the message to a Celery worker to execute.
+You can find more information about the CeleryExecutor and how to configure it
at the
+<a
href="https://airflow.apache.org/docs/stable/executor/celery.html#celery-executor"
target="_blank">documentation</a></p>
+
+<h3 id="webserver">Webserver</h3>
+
+<p>The webserver is the web interface (UI) for Airflow. The UI is
feature-rich. It makes it easy to
+monitor and troubleshoot DAGs and Tasks.</p>
+
+<p><img src="airflow-ui.png" alt="airflow UI"
/></p>
+
+<p>There are many actions you can perform on the UI. You can trigger a
task, monitor the execution
+including the duration of the task. The UI makes it possible to view the
task&rsquo;s dependencies in a
+tree view and graph view. You can view task logs in the UI.</p>
+
+<p>The web UI is started with the command <code>airflow
webserver</code> in the breeze environment.</p>
+
+<h3 id="backend">Backend</h3>
+
+<p>By default, Airflow uses the SQLite backend for storing the
configuration information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.</p>
+
+<p>You can use PostgreSQL or MySQL as a backend for airflow. It is easy
to change to PostgreSQL or MySQL.</p>
+
+<p>The command <code>./breeze --backend mysql</code> selects
MySQL as the backend when starting the breeze environment.</p>
+
+<h3 id="operators">Operators</h3>
+
+<p>Operators determine what gets done by a task. Airflow has a lot of
builtin Operators. Each operator
+does a specific task. There&rsquo;s a BashOperator that executes a bash
command, the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and <a
href="https://airflow.apache.org/docs/stable/concepts.html#operators"
target="_blank">many more</a>.</p>
+
+<h4 id="sensors">Sensors</h4>
+
+<p>Sensors can be described as special operators that are used to
monitor a long-running task.
+Just like Operators, there are many predefined sensors in Airflow. These
includes</p>
+
+<ul>
+<li>AthenaSensor: Asks for the state of the Query until it reaches a
failure state or success state.</li>
+<li>AzureCosmosDocumentSensor: Checks for the existence of a document
which matches the given query in CosmosDB</li>
+<li>GoogleCloudStorageObjectSensor: Checks for the existence of a file
in Google Cloud Storage</li>
+</ul>
+
+<p>A list of most of the available sensors can be found in this <a
href="https://airflow.apache.org/docs/stable/_api/airflow/contrib/sensors/index.html?highlight=sensors#module-airflow.contrib.sensors"
target="_blank">module</a></p>
+
+<h3 id="breeze-environment-1">Breeze Environment</h3>
+
+<p>The breeze environment is the development environment for Airflow
where you can run tests, build images,
+build documentations and so many other things. There are excellent
+<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">documentation and video</a> on Breeze
environment.
+Please check them out.</p>
+
+<h3 id="contributing-to-airflow">Contributing to
Airflow</h3>
+
+<p>Airflow is an open source project, everyone is welcome to contribute.
It is easy to get started thanks
+to the excellent <a
href="https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst"
target="_blank">documentation on how to get
started</a>.</p>
+
+<p>I joined the community about 12 weeks ago through the <a
href="https://www.outreachy.org/" target="_blank">Outreachy
Program</a> and have
+completed about <a
href="https://github.com/apache/airflow/pulls/ephraimbuddy"
target="_blank">40 PRs</a>.</p>
+
+<p>It has been an amazing experience! Thanks to my mentors <a
href="https://github.com/potiuk"
target="_blank">Jarek</a> and
+<a href="https://github.com/kaxil"
target="_blank">Kaxil</a>, and the community members
especially <a href="https://github.com/mik-laj"
target="_blank">Kamil</a>
+and <a href="https://github.com/turbaszek"
target="_blank">Tomek</a> for all their support. I&rsquo;m
grateful!</p>
+
+<p>Thank you so much, <a href="https://github.com/leahecole"
target="_blank">Leah E. Cole</a>, for your wonderful
reviews.</p>
+
+ </description>
+ </item>
+
+ <item>
<title>Blog: Airflow Survey 2019</title>
<link>/blog/airflow-survey/</link>
<pubDate>Wed, 11 Dec 2019 00:00:00 +0000</pubDate>
diff --git a/index.html b/index.html
index c05e5e8..1e93c07 100644
--- a/index.html
+++ b/index.html
@@ -1198,12 +1198,12 @@ if (!doNotTrack) {
<div id="integrations-container" class="list-items">
- <a class="list-item"
href="/docs/stable/integration.html#service-integrations">
+ <a class="list-item"
href="/docs/stable/integration.html#gcp-google-cloud-platform">
<div class="card">
<div class="box-event box-event__integration">
- <span class="box-event__integration--name">Atlassian Jira</span>
+ <span class="box-event__integration--name">Storage Transfer
Service</span>
</div>
</div>
@@ -1216,7 +1216,7 @@ if (!doNotTrack) {
<div class="card">
<div class="box-event box-event__integration">
- <span class="box-event__integration--name">Cloud Build</span>
+ <span class="box-event__integration--name">Cloud Data Loss Prevention
(DLP)</span>
</div>
</div>
@@ -1224,12 +1224,12 @@ if (!doNotTrack) {
- <a class="list-item"
href="/docs/stable/integration.html#software-integrations">
+ <a class="list-item"
href="/docs/stable/integration.html#gcp-google-cloud-platform">
<div class="card">
<div class="box-event box-event__integration">
- <span class="box-event__integration--name">MySQL</span>
+ <span class="box-event__integration--name">Cloud Build</span>
</div>
</div>
@@ -1237,12 +1237,12 @@ if (!doNotTrack) {
- <a class="list-item"
href="/docs/stable/integration.html#azure-microsoft-azure">
+ <a class="list-item"
href="/docs/stable/integration.html#software-integrations">
<div class="card">
<div class="box-event box-event__integration">
- <span class="box-event__integration--name">Azure Cosmos DB</span>
+ <span class="box-event__integration--name">Docker</span>
</div>
</div>
@@ -1255,7 +1255,7 @@ if (!doNotTrack) {
<div class="card">
<div class="box-event box-event__integration">
- <span class="box-event__integration--name">Apache Pinot</span>
+ <span class="box-event__integration--name">Apache Hive</span>
</div>
</div>
@@ -1263,12 +1263,12 @@ if (!doNotTrack) {
- <a class="list-item"
href="/docs/stable/integration.html#gcp-google-cloud-platform">
+ <a class="list-item"
href="/docs/stable/integration.html#software-integrations">
<div class="card">
<div class="box-event box-event__integration">
- <span class="box-event__integration--name">Cloud Vision</span>
+ <span class="box-event__integration--name">Celery</span>
</div>
</div>
@@ -1281,7 +1281,7 @@ if (!doNotTrack) {
<div class="card">
<div class="box-event box-event__integration">
- <span class="box-event__integration--name">Docker</span>
+ <span class="box-event__integration--name">Samba</span>
</div>
</div>
@@ -1289,12 +1289,12 @@ if (!doNotTrack) {
- <a class="list-item"
href="/docs/stable/integration.html#software-integrations">
+ <a class="list-item"
href="/docs/stable/integration.html#gcp-google-cloud-platform">
<div class="card">
<div class="box-event box-event__integration">
- <span class="box-event__integration--name">Kubernetes</span>
+ <span class="box-event__integration--name">Dataflow</span>
</div>
</div>
diff --git a/index.xml b/index.xml
index 655ef10..090cc66 100644
--- a/index.xml
+++ b/index.xml
@@ -13,6 +13,172 @@
<item>
+ <title>Blog: Apache Airflow For Newcomers</title>
+ <link>/blog/apache-airflow-for-new-comers/</link>
+ <pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate>
+
+ <guid>/blog/apache-airflow-for-new-comers/</guid>
+ <description>
+
+
+
+
+<p>Apache Airflow is a platform to programmatically author, schedule,
and monitor workflows.
+A workflow is a sequence of tasks that processes a set of data. You can think
of workflow as the
+path that describes how tasks go from being undone to done. Scheduling, on the
other hand, is the
+process of planning, controlling, and optimizing when a particular task should
be done.</p>
+
+<h3 id="authoring-workflow-in-apache-airflow">Authoring Workflow
in Apache Airflow.</h3>
+
+<p>Airflow makes it easy to author workflows using python scripts. A
<a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph"
target="_blank">Directed Acyclic Graph</a>
+(DAG) represents a workflow in Airflow. It is a collection of tasks in a way
that shows each task&rsquo;s
+relationships and dependencies. You can have as many DAGs as you want, and
Airflow will execute
+them according to the task&rsquo;s relationships and dependencies. If task
B depends on the successful
+execution of another task A, it means Airflow will run task A and only run
task B after task A.
+This dependency is very easy to express in Airflow. For example, the above
scenario is expressed as</p>
+
+<pre><code class="language-python">task_A &gt;&gt;
task_B
+</code></pre>
+
+<p>Also equivalent to</p>
+
+<pre><code
class="language-python">task_A.set_downstream(task_B)
+</code></pre>
+
+<p><img src="Simple_dag.png" alt="Simple Dag"
/></p>
+
+<p>That helps Airflow to know that it needs to execute task A before
task B. Tasks can have far more complex
+relationships to each other than expressed above and Airflow figures out how
and when to execute the tasks following
+their relationships and dependencies.
+<img src="semicomplex.png" alt="Complex Dag" /></p>
+
+<p>Before we discuss the architecture of Airflow that makes scheduling,
executing, and monitoring of
+workflow an easy thing, let us discuss the <a
href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">Breeze environment</a>.</p>
+
+<h3 id="breeze-environment">Breeze Environment</h3>
+
+<p>The breeze environment is the development environment for Airflow
where you can run tests, build images,
+build documentations and so many other things. There are excellent
+<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">documentation and video</a> on Breeze
environment.
+Please check them out. You enter the Breeze environment by running the
<code>./breeze</code> script. You can run all
+the commands mentioned here in the Breeze environment.</p>
+
+<h3 id="scheduler">Scheduler</h3>
+
+<p>The scheduler is the component that monitors DAGs and triggers those
tasks whose dependencies have
+been met. It watches over the DAG folder, checking the tasks in each DAG and
triggers them once they
+are ready. It accomplishes this by spawning a process that runs
periodically(every minute or so)
+reading the metadata database to check the status of each task and decides
what needs to be done.
+The metadata database is where the status of all tasks are recorded. The
status can be one of running,
+ success, failed, etc.</p>
+
+<p>A task is said to be ready when its dependencies have been met. The
dependencies include all the data
+necessary for the task to be executed. It should be noted that the scheduler
won&rsquo;t trigger your tasks until
+the period it covers has ended. If a task&rsquo;s
<code>schedule_interval</code> is <code>@daily</code>,
the scheduler triggers the task
+at the end of the day and not at the beginning. This is to ensure that the
necessary data needed for the tasks
+are ready. It is also possible trigger tasks manually on the UI.</p>
+
+<p>In the <a
href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">Breeze environment</a>, the scheduler is
started by running the command <code>airflow scheduler</code>. It
uses
+the configured production environment. The configuration can be specified in
<code>airflow.cfg</code></p>
+
+<h3 id="executor">Executor</h3>
+
+<p>Executors are responsible for running tasks. They work with the
scheduler to get information about
+what resources are needed to run a task as the task is queued.</p>
+
+<p>By default, Airflow uses the <a
href="https://airflow.apache.org/docs/stable/executor/sequential.html#sequential-executor"
target="_blank">SequentialExecutor</a>.
+ However, this executor is limited and it is the only executor that can be
used with SQLite.</p>
+
+<p>There are many other <a
href="https://airflow.apache.org/docs/stable/executor/index.html"
target="_blank">executors</a>,
+ the difference is on the resources they have and how they choose to use the
resources. The available executors
+ are:</p>
+
+<ul>
+<li>Sequential Executor</li>
+<li>Debug Executor</li>
+<li>Local Executor</li>
+<li>Dask Executor</li>
+<li>Celery Executor</li>
+<li>Kubernetes Executor</li>
+<li>Scaling Out with Mesos (community contributed)</li>
+</ul>
+
+<p>CeleryExecutor is a better executor compared to the
SequentialExecutor. The CeleryExecutor uses several
+workers to execute a job in a distributed way. If a worker node is ever down,
the CeleryExecutor assign its
+task to another worker node. This ensures high availability.</p>
+
+<p>The CeleryExecutor works closely with the scheduler which adds a
message to the queue and the Celery broker
+which delivers the message to a Celery worker to execute.
+You can find more information about the CeleryExecutor and how to configure it
at the
+<a
href="https://airflow.apache.org/docs/stable/executor/celery.html#celery-executor"
target="_blank">documentation</a></p>
+
+<h3 id="webserver">Webserver</h3>
+
+<p>The webserver is the web interface (UI) for Airflow. The UI is
feature-rich. It makes it easy to
+monitor and troubleshoot DAGs and Tasks.</p>
+
+<p><img src="airflow-ui.png" alt="airflow UI"
/></p>
+
+<p>There are many actions you can perform on the UI. You can trigger a
task, monitor the execution
+including the duration of the task. The UI makes it possible to view the
task&rsquo;s dependencies in a
+tree view and graph view. You can view task logs in the UI.</p>
+
+<p>The web UI is started with the command <code>airflow
webserver</code> in the breeze environment.</p>
+
+<h3 id="backend">Backend</h3>
+
+<p>By default, Airflow uses the SQLite backend for storing the
configuration information, DAG states,
+and much other useful information. This should not be used in production as
SQLite can cause a data
+loss.</p>
+
+<p>You can use PostgreSQL or MySQL as a backend for airflow. It is easy
to change to PostgreSQL or MySQL.</p>
+
+<p>The command <code>./breeze --backend mysql</code> selects
MySQL as the backend when starting the breeze environment.</p>
+
+<h3 id="operators">Operators</h3>
+
+<p>Operators determine what gets done by a task. Airflow has a lot of
builtin Operators. Each operator
+does a specific task. There&rsquo;s a BashOperator that executes a bash
command, the PythonOperator which
+calls a python function, AwsBatchOperator which executes a job on AWS Batch
and <a
href="https://airflow.apache.org/docs/stable/concepts.html#operators"
target="_blank">many more</a>.</p>
+
+<h4 id="sensors">Sensors</h4>
+
+<p>Sensors can be described as special operators that are used to
monitor a long-running task.
+Just like Operators, there are many predefined sensors in Airflow. These
includes</p>
+
+<ul>
+<li>AthenaSensor: Asks for the state of the Query until it reaches a
failure state or success state.</li>
+<li>AzureCosmosDocumentSensor: Checks for the existence of a document
which matches the given query in CosmosDB</li>
+<li>GoogleCloudStorageObjectSensor: Checks for the existence of a file
in Google Cloud Storage</li>
+</ul>
+
+<p>A list of most of the available sensors can be found in this <a
href="https://airflow.apache.org/docs/stable/_api/airflow/contrib/sensors/index.html?highlight=sensors#module-airflow.contrib.sensors"
target="_blank">module</a></p>
+
+<h3 id="breeze-environment-1">Breeze Environment</h3>
+
+<p>The breeze environment is the development environment for Airflow
where you can run tests, build images,
+build documentations and so many other things. There are excellent
+<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst"
target="_blank">documentation and video</a> on Breeze
environment.
+Please check them out.</p>
+
+<h3 id="contributing-to-airflow">Contributing to
Airflow</h3>
+
+<p>Airflow is an open source project, everyone is welcome to contribute.
It is easy to get started thanks
+to the excellent <a
href="https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst"
target="_blank">documentation on how to get
started</a>.</p>
+
+<p>I joined the community about 12 weeks ago through the <a
href="https://www.outreachy.org/" target="_blank">Outreachy
Program</a> and have
+completed about <a
href="https://github.com/apache/airflow/pulls/ephraimbuddy"
target="_blank">40 PRs</a>.</p>
+
+<p>It has been an amazing experience! Thanks to my mentors <a
href="https://github.com/potiuk"
target="_blank">Jarek</a> and
+<a href="https://github.com/kaxil"
target="_blank">Kaxil</a>, and the community members
especially <a href="https://github.com/mik-laj"
target="_blank">Kamil</a>
+and <a href="https://github.com/turbaszek"
target="_blank">Tomek</a> for all their support. I&rsquo;m
grateful!</p>
+
+<p>Thank you so much, <a href="https://github.com/leahecole"
target="_blank">Leah E. Cole</a>, for your wonderful
reviews.</p>
+
+ </description>
+ </item>
+
+ <item>
<title>Blog: Implementing Stable API for Apache Airflow</title>
<link>/blog/implementing-stable-api-for-apache-airflow/</link>
<pubDate>Sun, 19 Jul 2020 00:00:00 +0000</pubDate>
diff --git a/search/index.html b/search/index.html
index d6bf358..4581b1d 100644
--- a/search/index.html
+++ b/search/index.html
@@ -35,12 +35,12 @@
<meta property="og:url" content="/search/" />
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Search Results">
<meta itemprop="description" content="">
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="0">
diff --git a/sitemap.xml b/sitemap.xml
index 082b908..5e7d819 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -4,202 +4,207 @@
<url>
<loc>/docs/overview/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tasks/beds/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tasks/ponycopters/configuring-ponycopters/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/getting-started/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/examples/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tasks/ponycopters/launching-ponycopters/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tutorials/multi-bear/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tasks/porridge/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/concepts/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tasks/task/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tutorials/tutorial2/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tasks/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tutorials/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/reference/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/contribution-guidelines/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
- <loc>/announcements/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <loc>/blog/apache-airflow-for-new-comers/</loc>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
- <loc>/blog/implementing-stable-api-for-apache-airflow/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <loc>/blog/tags/community/</loc>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
- <loc>/blog/tags/rest-api/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <loc>/tags/</loc>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
- <loc>/tags/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <loc>/announcements/</loc>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
+ </url>
+
+ <url>
+ <loc>/blog/implementing-stable-api-for-apache-airflow/</loc>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
+ </url>
+
+ <url>
+ <loc>/blog/tags/rest-api/</loc>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/airflow-1.10.10/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/tags/release/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/airflow-1.10.8-1.10.9/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/tags/documentation/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/airflow-survey/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
- </url>
-
- <url>
- <loc>/blog/tags/community/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/announcing-new-website/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/tags/survey/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/tags/users/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/tags/development/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/documenting-using-local-development-environments/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/its-a-breeze-to-develop-apache-airflow/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/getting-started/example-page/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/reference/parameter-reference/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/docs/tasks/ponycopters/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/use-cases/adobe/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/use-cases/big-fish-games/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/blog/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
@@ -208,57 +213,57 @@
<url>
<loc>/community/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/use-cases/dish/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/use-cases/experity/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/install/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/meetups/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/use-cases/onefootball/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/privacy-notice/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/roadmap/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/search/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
<url>
<loc>/use-cases/</loc>
- <lastmod>2020-08-10T00:48:30+05:30</lastmod>
+ <lastmod>2020-08-17T20:02:22+01:00</lastmod>
</url>
</urlset>
\ No newline at end of file
diff --git a/tags/index.html b/tags/index.html
index 6df05af..3a80612 100644
--- a/tags/index.html
+++ b/tags/index.html
@@ -37,7 +37,7 @@
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="og:updated_time" content="2020-07-19T00:00:00+00:00" /><meta
property="og:site_name" content="Apache Airflow" />
+<meta property="og:updated_time" content="2020-08-17T00:00:00+00:00" /><meta
property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Tags">
<meta itemprop="description" content="Platform created by the community to
programmatically author, schedule and monitor workflows.">
diff --git a/tags/index.xml b/tags/index.xml
index 4488a13..5eef421 100644
--- a/tags/index.xml
+++ b/tags/index.xml
@@ -4,7 +4,7 @@
<link>/tags/</link>
<description>Recent content in Tags on Apache Airflow</description>
<generator>Hugo -- gohugo.io</generator>
- <lastBuildDate>Sun, 19 Jul 2020 00:00:00 +0000</lastBuildDate>
+ <lastBuildDate>Mon, 17 Aug 2020 00:00:00 +0000</lastBuildDate>
<atom:link href="/tags/index.xml" rel="self"
type="application/rss+xml" />
diff --git a/use-cases/adobe/index.html b/use-cases/adobe/index.html
index f105a91..6c2c827 100644
--- a/use-cases/adobe/index.html
+++ b/use-cases/adobe/index.html
@@ -35,12 +35,12 @@
<meta property="og:url" content="/use-cases/adobe/" />
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Adobe">
<meta itemprop="description" content="What was the problem? Modern big data
platforms need sophisticated data pipelines connecting to many backend services
enabling complex workflows. These workflows need to be deployed, monitored, and
run either on regular schedules or triggered by external events. Adobe
Experience Platform component services architected and built an orchestration
service to enable their users to author, schedule, and monitor complex
hierarchical (including sequential a [...]
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="251">
diff --git a/use-cases/big-fish-games/index.html
b/use-cases/big-fish-games/index.html
index 7f28c13..28c30c0 100644
--- a/use-cases/big-fish-games/index.html
+++ b/use-cases/big-fish-games/index.html
@@ -35,12 +35,12 @@
<meta property="og:url" content="/use-cases/big-fish-games/" />
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Big Fish Games">
<meta itemprop="description" content="What was the problem? The main challenge
is the lack of standardized ETL workflow orchestration tools. PowerShell and
Python-based ETL frameworks built in-house are currently used for scheduling
and running analytical workloads. However, there is no web UI through which we
can monitor these workflows and it requires additional effort to maintain this
framework. These scheduled jobs based on external dependencies are not well
suited to modern Big Data [...]
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="336">
diff --git a/use-cases/dish/index.html b/use-cases/dish/index.html
index 70e397f..f2b202b 100644
--- a/use-cases/dish/index.html
+++ b/use-cases/dish/index.html
@@ -35,12 +35,12 @@
<meta property="og:url" content="/use-cases/dish/" />
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Dish">
<meta itemprop="description" content="What was the problem? We faced
increasing complexity managing lengthy crontabs with scheduling being an issue,
this required carefully planning timing due to resource constraints, usage
patterns, and especially custom code needed for retry logic. In the last case,
having to verify success of previous jobs and/or steps prior to running the
next. Furthermore, time to results is important, but we were increasingly
relying on buffers for processing, wher [...]
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="249">
diff --git a/use-cases/experity/index.html b/use-cases/experity/index.html
index d8c5a0f..db8cded 100644
--- a/use-cases/experity/index.html
+++ b/use-cases/experity/index.html
@@ -36,13 +36,13 @@ How did Apache Airflow help to solve this problem?
Ultimately we decided flexibl
<meta property="og:url" content="/use-cases/experity/" />
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Experity">
<meta itemprop="description" content="What was the problem? We had to deploy
our complex, flagship app to multiple nodes in multiple ways. This required
tasks to communicate across Windows nodes and coordinate timing perfectly. We
did not want to buy an expensive enterprise scheduling tool and needed ultimate
flexibility.
How did Apache Airflow help to solve this problem? Ultimately we decided
flexible, multi-node, DAG capable tooling was key and airflow was one of the
few tools that fit that bill.">
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="191">
diff --git a/use-cases/onefootball/index.html b/use-cases/onefootball/index.html
index bd76bd4..013cd0e 100644
--- a/use-cases/onefootball/index.html
+++ b/use-cases/onefootball/index.html
@@ -36,13 +36,13 @@ On top of that, new data tools appear each month: third
party data sources, clou
<meta property="og:url" content="/use-cases/onefootball/" />
<meta property="og:image" content="/images/feature-image.png" />
-<meta property="article:modified_time" content="2020-08-10T00:48:30+05:30"
/><meta property="og:site_name" content="Apache Airflow" />
+<meta property="article:modified_time" content="2020-08-17T20:02:22+01:00"
/><meta property="og:site_name" content="Apache Airflow" />
<meta itemprop="name" content="Onefootball">
<meta itemprop="description" content="What was the problem? With millions of
daily active users, managing the complexity of data engineering at Onefootball
is a constant challenge. Lengthy crontabs, multiplication of custom API
clients, erosion of confidence in the analytics served, increasing heroism
(“only one person can solve this issue”). Those are the challenges
that most teams face unless they consciously invest in their tools and
processes.
On top of that, new data tools appear each month: third party data sources,
cloud providers solutions, different storage technologies… Managing all
those integrations is costly and brittle, especially for small data engineering
teams that are trying to do more with less.">
-<meta itemprop="dateModified" content="2020-08-10T00:48:30+05:30" />
+<meta itemprop="dateModified" content="2020-08-17T20:02:22+01:00" />
<meta itemprop="wordCount" content="294">