Re: [PR] Include plugins in the architecture diagrams [airflow]

via GitHub Wed, 03 Jan 2024 10:51:23 -0800


potiuk commented on code in PR #36513:
URL: https://github.com/apache/airflow/pull/36513#discussion_r1440807358



##########
docs/apache-airflow/core-concepts/overview.rst:
##########
@@ -18,49 +18,151 @@
 Architecture Overview
 =====================
 
-Airflow is a platform that lets you build and run *workflows*. A workflow is 
represented as a :doc:`DAG <dags>` (a Directed Acyclic Graph), and contains 
individual pieces of work called :doc:`tasks`, arranged with dependencies and 
data flows taken into account.
+Airflow is a platform that lets you build and run *workflows*. A workflow is 
represented as a
+:doc:`DAG <dags>` (a Directed Acyclic Graph), and contains individual pieces 
of work called
+:doc:`tasks`, arranged with dependencies and data flows taken into account.
 
 .. image:: ../img/edge_label_example.png
   :alt: An example Airflow DAG, rendered in Graph
 
-A DAG specifies the dependencies between Tasks, and the order in which to 
execute them and run retries; the Tasks themselves describe what to do, be it 
fetching data, running analysis, triggering other systems, or more.
+A DAG specifies the dependencies between Tasks, and the order in which to 
execute them and run retries;
+the Tasks themselves describe what to do, be it fetching data, running 
analysis, triggering other systems,
+or more.
 
-An Airflow installation generally consists of the following components:
+Airflow components
+------------------
 
-* A :doc:`scheduler <../administration-and-deployment/scheduler>`, which 
handles both triggering scheduled workflows, and submitting :doc:`tasks` to the 
executor to run.
+Required components
+...................
 
-* An :doc:`executor <executor/index>`, which handles running tasks. In the 
default Airflow installation, this runs everything *inside* the scheduler, but 
most production-suitable executors actually push task execution out to 
*workers*.
+Minimal Airflow installation consists of the following components:
 
-* A *triggerer*, which executes deferred tasks - executed in an async-io event 
loop.
+* A :doc:`scheduler <../administration-and-deployment/scheduler>`, which 
handles both triggering scheduled
+  workflows, and submitting :doc:`tasks` to the executor to run. The 
:doc:`executor <executor/index>`, is
+  a configuration property of the *scheduler*, not a separate component and 
runs within the scheduler
+  process. There are several executors available out of the box, and you can 
also write your own.
 
-* A *webserver*, which presents a handy user interface to inspect, trigger and 
debug the behaviour of DAGs and tasks.
+* A *webserver*, which presents a handy user interface to inspect, trigger and 
debug the behaviour of
+  DAGs and tasks.
 
-* A folder of *DAG files*, read by the scheduler and executor (and any workers 
the executor has)
+* A folder of *DAG files*, is read by the *scheduler* to figure out what tasks 
to run and when and to
+  run them.
 
-* A *metadata database*, used by the scheduler, executor and webserver to 
store state.
+* A *metadata database*, used by the *scheduler*, and *webserver* to store 
state of workflows and tasks.
+  Setting up a metadata database is described in :doc:`/howto/set-up-database` 
and is required for
+  Airflow to work.
 
+Optional components
+...................
 
-Basic airflow architecture
---------------------------
+There are also some optional components that are not present in the basic 
installation
 
-This is the basic architecture of Airflow that you'll see in simple 
installations:
+* Optional *worker*, which executes the tasks given to it by the scheduler. In 
the basic installation
+  worker might be part of the scheduler not a separate component. It can be 
run as a long running process
+  in the :doc:`CeleryExecutor <executor/celery>`, or as a POD in the
+  :doc:`KubernetesExecutor <executor/kubernetes>`.
+
+* Optional *triggerer*, which executes deferred tasks in an async-io event 
loop. In basic installation
+  where deferred tasks are not used, triggerer might not be present. More 
about deferring tasks can be
+  found in :doc:`/authoring-and-scheduling/deferring`.
+
+* Optional *dag processor*, which parses DAG files and synchronizes them into 
the
+  *metadata database* in basic installation *dag processor* might be part of 
the scheduler not
+  a separate component.
+
+* A folder of *DAG files*, is read by  *dag processor*, *workers* and 
*triggerer* when they are running.
+  If *dag processor* is present *scheduler** does not need to read the *DAG 
files* directly. More about
+  processing DAG files can be found in 
:doc:`/authoring-and-scheduling/dagfile-processing`

Review Comment:
   I merged this with **dag processor** paragraph before:
   
   ```
   * Optional *dag processor*, which parses DAG files and serializes them into 
the
     *metadata database*. By default, the *dag processor* process is part of 
the scheduler, but it can
     be run as a separate component for scalability and security reasons. If 
*dag processor* is present
     *scheduler** does not need to read the *DAG files* directly. More about
     processing DAG files can be found in 
:doc:`/authoring-and-scheduling/dagfile-processing`
   ```
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Include plugins in the architecture diagrams [airflow]

Reply via email to