This is an automated email from the ASF dual-hosted git repository.

basph pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new 32f5eb1e8d Rewrite the Airflow documentation home page (#24795)
32f5eb1e8d is described below

commit 32f5eb1e8da837eac7cd475f8a7baa9ed21fa351
Author: Bas Harenslak <[email protected]>
AuthorDate: Mon Jul 11 15:16:53 2022 +0200

    Rewrite the Airflow documentation home page (#24795)
    
    * Rewrite the Airflow home page
    
    * Rename home to overview
    
    * Ignore parameterization
    
    * Remove history since that can be read elsewhere
    
    * Link companies to inthewild.md
    
    * Process comment
---
 docs/apache-airflow/img/airflow.gif                | Bin 416302 -> 0 bytes
 docs/apache-airflow/img/hello_world_graph_view.png | Bin 0 -> 73688 bytes
 docs/apache-airflow/img/hello_world_grid_view.png  | Bin 0 -> 132851 bytes
 docs/apache-airflow/index.rst                      | 121 +++++++++++++++------
 docs/spelling_wordlist.txt                         |   1 +
 5 files changed, 86 insertions(+), 36 deletions(-)

diff --git a/docs/apache-airflow/img/airflow.gif 
b/docs/apache-airflow/img/airflow.gif
deleted file mode 100644
index 076fe8e978..0000000000
Binary files a/docs/apache-airflow/img/airflow.gif and /dev/null differ
diff --git a/docs/apache-airflow/img/hello_world_graph_view.png 
b/docs/apache-airflow/img/hello_world_graph_view.png
new file mode 100644
index 0000000000..18ef6eb4ff
Binary files /dev/null and b/docs/apache-airflow/img/hello_world_graph_view.png 
differ
diff --git a/docs/apache-airflow/img/hello_world_grid_view.png 
b/docs/apache-airflow/img/hello_world_grid_view.png
new file mode 100644
index 0000000000..e2140c17eb
Binary files /dev/null and b/docs/apache-airflow/img/hello_world_grid_view.png 
differ
diff --git a/docs/apache-airflow/index.rst b/docs/apache-airflow/index.rst
index d6a781e0c1..66ac7f9d45 100644
--- a/docs/apache-airflow/index.rst
+++ b/docs/apache-airflow/index.rst
@@ -15,65 +15,114 @@
     specific language governing permissions and limitations
     under the License.
 
+What is Airflow?
+=========================================
+
+`Apache Airflow <https://github.com/apache/airflow>`_ is an open-source 
platform for developing, scheduling,
+and monitoring batch-oriented workflows. Airflow's extensible Python framework 
enables you to build workflows
+connecting with virtually any technology. A web interface helps manage the 
state of your workflows. Airflow is
+deployable in many ways, varying from a single process on your laptop to a 
distributed setup to support even
+the biggest workflows.
 
+Workflows as code
+=========================================
+The main characteristic of Airflow workflows is that all workflows are defined 
in Python code. "Workflows as
+code" serves several purposes:
 
+- **Dynamic**: Airflow pipelines are configured as Python code, allowing for 
dynamic pipeline generation.
+- **Extensible**: The Airflow framework contains operators to connect with 
numerous technologies. All Airflow components are extensible to easily adjust 
to your environment.
+- **Flexible**: Workflow parameterization is built-in leveraging the `Jinja 
<https://jinja.palletsprojects.com>`_ templating engine.
 
-.. image:: ../../airflow/www/static/pin_large.png
-    :width: 100
+Take a look at the following snippet of code:
 
-Apache Airflow Documentation
-=========================================
+.. code-block:: python
+
+    from datetime import datetime
+
+    from airflow import DAG
+    from airflow.operators.bash import BashOperator
+    from airflow.operators.python import PythonOperator
+
+    # A DAG represents a workflow, a collection of tasks
+    with DAG(dag_id="demo", start_date=datetime(2022, 1, 1), 
schedule_interval="0 0 * * *") as dag:
 
-Airflow is a platform to programmatically author, schedule and monitor
-workflows.
+        # Tasks are represented as operators
+        hello = BashOperator(task_id="hello", bash_command="echo hello")
+        airflow = PythonOperator(task_id="airflow", python_callable=lambda: 
print("airflow"))
 
-Use Airflow to author workflows as Directed Acyclic Graphs (DAGs) of tasks.
-The Airflow scheduler executes your tasks on an array of workers while
-following the specified dependencies. Rich command line utilities make
-performing complex surgeries on DAGs a snap. The rich user interface
-makes it easy to visualize pipelines running in production,
-monitor progress, and troubleshoot issues when needed.
+        # Set dependencies between tasks
+        hello >> airflow
 
-When workflows are defined as code, they become more maintainable,
-versionable, testable, and collaborative.
 
+Here you see:
 
+- A DAG named "demo", starting on Jan 1st 2022 and running once a day. A DAG 
is Airflow's representation of a workflow.
+- Two tasks, a BashOperator running a Bash script and a PythonOperator running 
a Python script
+- ``>>`` between the tasks defines a dependency and controls in which order 
the tasks will be executed
 
-.. image:: img/airflow.gif
+Airflow evaluates this script and executes the tasks at the set interval and 
in the defined order. The status
+of the "demo" DAG is visible in the web interface:
 
-------------
+.. image:: /img/hello_world_graph_view.png
+  :alt: Demo DAG in the Graph View, showing the status of one DAG run
 
-Principles
-----------
+This example demonstrates a simple Bash and Python script, but these tasks can 
run any arbitrary code. Think
+of running a Spark job, moving data between two buckets, or sending an email. 
The same structure can also be
+seen running over time:
 
-- **Dynamic**:  Airflow pipelines are configuration as code (Python), allowing 
for dynamic pipeline generation. This allows for writing code that instantiates 
pipelines dynamically.
-- **Extensible**:  Easily define your own operators, executors and extend the 
library so that it fits the level of abstraction that suits your environment.
-- **Elegant**:  Airflow pipelines are lean and explicit. Parameterizing your 
scripts is built into the core of Airflow using the powerful **Jinja** 
templating engine.
-- **Scalable**:  Airflow has a modular architecture and uses a message queue 
to orchestrate an arbitrary number of workers. Airflow is ready to scale to 
infinity.
+.. image:: /img/hello_world_grid_view.png
+  :alt: Demo DAG in the Grid View, showing the status of all DAG runs
 
+Each column represents one DAG run. These are two of the most used views in 
Airflow, but there are several
+other views which allow you to deep dive into the state of your workflows.
 
-Beyond the Horizon
-------------------
+Why Airflow?
+=========================================
+Airflow is a batch workflow orchestration platform. The Airflow framework 
contains operators to connect with
+many technologies and is easily extensible to connect with a new technology. 
If your workflows have a clear
+start and end, and run at regular intervals, they can be programmed as an 
Airflow DAG.
+
+If you prefer coding over clicking, Airflow is the tool for you. Workflows are 
defined as Python code which
+means:
+
+- Workflows can be stored in version control so that you can roll back to 
previous versions
+- Workflows can be developed by multiple people simultaneously
+- Tests can be written to validate functionality
+- Components are extensible and you can build on a wide collection of existing 
components
+
+Rich scheduling and execution semantics enable you to easily define complex 
pipelines, running at regular
+intervals. Backfilling allows you to (re-)run pipelines on historical data 
after making changes to your logic.
+And the ability to rerun partial pipelines after resolving an error helps 
maximize efficiency.
 
-Airflow **is not** a data streaming solution. Tasks do not move data from
-one to the other (though tasks can exchange metadata!). Airflow is not
-in the `Spark Streaming <http://spark.apache.org/streaming/>`_
-or `Storm <https://storm.apache.org/>`_ space, it is more comparable to
-`Oozie <http://oozie.apache.org/>`_ or
-`Azkaban <https://azkaban.github.io/>`_.
+Airflow's user interface provides both in-depth views of pipelines and 
individual tasks, and an overview of
+pipelines over time. From the interface, you can inspect logs and manage 
tasks, for example retrying a task in
+case of failure.
+
+The open-source nature of Airflow ensures you work on components developed, 
tested, and used by many other
+`companies <https://github.com/apache/airflow/blob/main/INTHEWILD.md>`_ around 
the world. In the active
+`community <https://airflow.apache.org/community>`_ you can find plenty of 
helpful resources in the form of
+blogs posts, articles, conferences, books, and more. You can connect with 
other peers via several channels
+such as `Slack <https://s.apache.org/airflow-slack>`_ and mailing lists.
+
+Why not Airflow?
+=========================================
+Airflow was built for finite batch workflows. While the CLI and REST API do 
allow triggering workflows,
+Airflow was not built for infinitely-running event-based workflows. Airflow is 
not a streaming solution.
+However, a streaming system such as Apache Kafka is often seen working 
together with Apache Airflow. Kafka can
+be used for ingestion and processing in real-time, event data is written to a 
storage location, and Airflow
+periodically starts a workflow processing a batch of data.
 
-Workflows are expected to be mostly static or slowly changing. You can think
-of the structure of the tasks in your workflow as slightly more dynamic
-than a database structure would be. Airflow workflows are expected to look
-similar from a run to the next, this allows for clarity around
-unit of work and continuity.
+If you prefer clicking over coding, Airflow is probably not the right 
solution. The web interface aims to make
+managing workflows as easy as possible and the Airflow framework is 
continuously improved to make the
+developer experience as smooth as possible. However, the philosophy of Airflow 
is to define workflows as code
+so coding will always be required.
 
 
 .. toctree::
     :hidden:
     :caption: Content
 
-    Home <self>
+    Overview <self>
     project
     license
     start/index
diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
index 3af1c984b6..e6779724bf 100644
--- a/docs/spelling_wordlist.txt
+++ b/docs/spelling_wordlist.txt
@@ -1157,6 +1157,7 @@ param
 parametable
 parameterType
 parameterValue
+parameterization
 parameterizing
 paramiko
 params

Reply via email to