This is an automated email from the ASF dual-hosted git repository.
basph pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new 32f5eb1e8d Rewrite the Airflow documentation home page (#24795)
32f5eb1e8d is described below
commit 32f5eb1e8da837eac7cd475f8a7baa9ed21fa351
Author: Bas Harenslak <[email protected]>
AuthorDate: Mon Jul 11 15:16:53 2022 +0200
Rewrite the Airflow documentation home page (#24795)
* Rewrite the Airflow home page
* Rename home to overview
* Ignore parameterization
* Remove history since that can be read elsewhere
* Link companies to inthewild.md
* Process comment
---
docs/apache-airflow/img/airflow.gif | Bin 416302 -> 0 bytes
docs/apache-airflow/img/hello_world_graph_view.png | Bin 0 -> 73688 bytes
docs/apache-airflow/img/hello_world_grid_view.png | Bin 0 -> 132851 bytes
docs/apache-airflow/index.rst | 121 +++++++++++++++------
docs/spelling_wordlist.txt | 1 +
5 files changed, 86 insertions(+), 36 deletions(-)
diff --git a/docs/apache-airflow/img/airflow.gif
b/docs/apache-airflow/img/airflow.gif
deleted file mode 100644
index 076fe8e978..0000000000
Binary files a/docs/apache-airflow/img/airflow.gif and /dev/null differ
diff --git a/docs/apache-airflow/img/hello_world_graph_view.png
b/docs/apache-airflow/img/hello_world_graph_view.png
new file mode 100644
index 0000000000..18ef6eb4ff
Binary files /dev/null and b/docs/apache-airflow/img/hello_world_graph_view.png
differ
diff --git a/docs/apache-airflow/img/hello_world_grid_view.png
b/docs/apache-airflow/img/hello_world_grid_view.png
new file mode 100644
index 0000000000..e2140c17eb
Binary files /dev/null and b/docs/apache-airflow/img/hello_world_grid_view.png
differ
diff --git a/docs/apache-airflow/index.rst b/docs/apache-airflow/index.rst
index d6a781e0c1..66ac7f9d45 100644
--- a/docs/apache-airflow/index.rst
+++ b/docs/apache-airflow/index.rst
@@ -15,65 +15,114 @@
specific language governing permissions and limitations
under the License.
+What is Airflow?
+=========================================
+
+`Apache Airflow <https://github.com/apache/airflow>`_ is an open-source
platform for developing, scheduling,
+and monitoring batch-oriented workflows. Airflow's extensible Python framework
enables you to build workflows
+connecting with virtually any technology. A web interface helps manage the
state of your workflows. Airflow is
+deployable in many ways, varying from a single process on your laptop to a
distributed setup to support even
+the biggest workflows.
+Workflows as code
+=========================================
+The main characteristic of Airflow workflows is that all workflows are defined
in Python code. "Workflows as
+code" serves several purposes:
+- **Dynamic**: Airflow pipelines are configured as Python code, allowing for
dynamic pipeline generation.
+- **Extensible**: The Airflow framework contains operators to connect with
numerous technologies. All Airflow components are extensible to easily adjust
to your environment.
+- **Flexible**: Workflow parameterization is built-in leveraging the `Jinja
<https://jinja.palletsprojects.com>`_ templating engine.
-.. image:: ../../airflow/www/static/pin_large.png
- :width: 100
+Take a look at the following snippet of code:
-Apache Airflow Documentation
-=========================================
+.. code-block:: python
+
+ from datetime import datetime
+
+ from airflow import DAG
+ from airflow.operators.bash import BashOperator
+ from airflow.operators.python import PythonOperator
+
+ # A DAG represents a workflow, a collection of tasks
+ with DAG(dag_id="demo", start_date=datetime(2022, 1, 1),
schedule_interval="0 0 * * *") as dag:
-Airflow is a platform to programmatically author, schedule and monitor
-workflows.
+ # Tasks are represented as operators
+ hello = BashOperator(task_id="hello", bash_command="echo hello")
+ airflow = PythonOperator(task_id="airflow", python_callable=lambda:
print("airflow"))
-Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks.
-The Airflow scheduler executes your tasks on an array of workers while
-following the specified dependencies. Rich command line utilities make
-performing complex surgeries on DAGs a snap. The rich user interface
-makes it easy to visualize pipelines running in production,
-monitor progress, and troubleshoot issues when needed.
+ # Set dependencies between tasks
+ hello >> airflow
-When workflows are defined as code, they become more maintainable,
-versionable, testable, and collaborative.
+Here you see:
+- A DAG named "demo", starting on Jan 1st 2022 and running once a day. A DAG
is Airflow's representation of a workflow.
+- Two tasks, a BashOperator running a Bash script and a PythonOperator running
a Python script
+- ``>>`` between the tasks defines a dependency and controls in which order
the tasks will be executed
-.. image:: img/airflow.gif
+Airflow evaluates this script and executes the tasks at the set interval and
in the defined order. The status
+of the "demo" DAG is visible in the web interface:
-------------
+.. image:: /img/hello_world_graph_view.png
+ :alt: Demo DAG in the Graph View, showing the status of one DAG run
-Principles
-----------
+This example demonstrates a simple Bash and Python script, but these tasks can
run any arbitrary code. Think
+of running a Spark job, moving data between two buckets, or sending an email.
The same structure can also be
+seen running over time:
-- **Dynamic**: Airflow pipelines are configuration as code (Python), allowing
for dynamic pipeline generation. This allows for writing code that instantiates
pipelines dynamically.
-- **Extensible**: Easily define your own operators, executors and extend the
library so that it fits the level of abstraction that suits your environment.
-- **Elegant**: Airflow pipelines are lean and explicit. Parameterizing your
scripts is built into the core of Airflow using the powerful **Jinja**
templating engine.
-- **Scalable**: Airflow has a modular architecture and uses a message queue
to orchestrate an arbitrary number of workers. Airflow is ready to scale to
infinity.
+.. image:: /img/hello_world_grid_view.png
+ :alt: Demo DAG in the Grid View, showing the status of all DAG runs
+Each column represents one DAG run. These are two of the most used views in
Airflow, but there are several
+other views which allow you to deep dive into the state of your workflows.
-Beyond the Horizon
-------------------
+Why Airflow?
+=========================================
+Airflow is a batch workflow orchestration platform. The Airflow framework
contains operators to connect with
+many technologies and is easily extensible to connect with a new technology.
If your workflows have a clear
+start and end, and run at regular intervals, they can be programmed as an
Airflow DAG.
+
+If you prefer coding over clicking, Airflow is the tool for you. Workflows are
defined as Python code which
+means:
+
+- Workflows can be stored in version control so that you can roll back to
previous versions
+- Workflows can be developed by multiple people simultaneously
+- Tests can be written to validate functionality
+- Components are extensible and you can build on a wide collection of existing
components
+
+Rich scheduling and execution semantics enable you to easily define complex
pipelines, running at regular
+intervals. Backfilling allows you to (re-)run pipelines on historical data
after making changes to your logic.
+And the ability to rerun partial pipelines after resolving an error helps
maximize efficiency.
-Airflow **is not** a data streaming solution. Tasks do not move data from
-one to the other (though tasks can exchange metadata!). Airflow is not
-in the `Spark Streaming <http://spark.apache.org/streaming/>`_
-or `Storm <https://storm.apache.org/>`_ space, it is more comparable to
-`Oozie <http://oozie.apache.org/>`_ or
-`Azkaban <https://azkaban.github.io/>`_.
+Airflow's user interface provides both in-depth views of pipelines and
individual tasks, and an overview of
+pipelines over time. From the interface, you can inspect logs and manage
tasks, for example retrying a task in
+case of failure.
+
+The open-source nature of Airflow ensures you work on components developed,
tested, and used by many other
+`companies <https://github.com/apache/airflow/blob/main/INTHEWILD.md>`_ around
the world. In the active
+`community <https://airflow.apache.org/community>`_ you can find plenty of
helpful resources in the form of
+blogs posts, articles, conferences, books, and more. You can connect with
other peers via several channels
+such as `Slack <https://s.apache.org/airflow-slack>`_ and mailing lists.
+
+Why not Airflow?
+=========================================
+Airflow was built for finite batch workflows. While the CLI and REST API do
allow triggering workflows,
+Airflow was not built for infinitely-running event-based workflows. Airflow is
not a streaming solution.
+However, a streaming system such as Apache Kafka is often seen working
together with Apache Airflow. Kafka can
+be used for ingestion and processing in real-time, event data is written to a
storage location, and Airflow
+periodically starts a workflow processing a batch of data.
-Workflows are expected to be mostly static or slowly changing. You can think
-of the structure of the tasks in your workflow as slightly more dynamic
-than a database structure would be. Airflow workflows are expected to look
-similar from a run to the next, this allows for clarity around
-unit of work and continuity.
+If you prefer clicking over coding, Airflow is probably not the right
solution. The web interface aims to make
+managing workflows as easy as possible and the Airflow framework is
continuously improved to make the
+developer experience as smooth as possible. However, the philosophy of Airflow
is to define workflows as code
+so coding will always be required.
.. toctree::
:hidden:
:caption: Content
- Home <self>
+ Overview <self>
project
license
start/index
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index 3af1c984b6..e6779724bf 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -1157,6 +1157,7 @@ param
parametable
parameterType
parameterValue
+parameterization
parameterizing
paramiko
params