mik-laj commented on a change in pull request #6247: [AIRFLOW-5588] Add
Celery's architecture diagram
URL: https://github.com/apache/airflow/pull/6247#discussion_r333026452
##########
File path: docs/executor/celery.rst
##########
@@ -72,3 +72,74 @@ Some caveats:
- Make sure to set a visibility timeout in [celery_broker_transport_options]
that exceeds the ETA of your longest running task
- Tasks can consume resources. Make sure your worker has enough resources to
run ``worker_concurrency`` tasks
- Queue names are limited to 256 characters, but each broker backend might
have its own restrictions
+
+Architecture
+------------
+
+.. graphviz::
+
+ digraph A{
+ rankdir="TB"
+ node[shape="rectangle", style="rounded"]
+
+
+ subgraph cluster {
+ label="Cluster";
+ {rank = same; dag; database}
+ {rank = same; workers; scheduler; web}
+
+ workers[label="Workers"]
+ scheduler[label="Scheduler"]
+ web[label="Web server"]
+ database[label="Database"]
+ dag[label="DAG files"]
+
+ subgraph cluster_queue {
+ label="Queue";
+ {rank = same; queue_broker; queue_result_backend}
+ queue_broker[label="Queue broker"]
+ queue_result_backend[label="Result backend"]
+ }
+
+ scheduler->workers[label="1"]
+ web->database[label="2"]
+ web->dag[label="3"]
+
+ workers->database[label="4"]
+ workers->dag[label="5"]
+ workers->queue_result_backend[label="6"]
+ workers->queue_broker[label="7"]
+
+ scheduler->database[label="8"]
+ scheduler->dag[label="9"]
+ scheduler->queue_result_backend[label="10"]
+ scheduler->queue_broker[label="11"]
+ }
+ }
+
+Airflow consist of several components:
+
+* **Workers** - Execute the assigned tasks
+* **Scheduler** - Responsible for adding the necessary tasks to the queue
+* **Web server** - Server HTTP provides access to DAG/task status information
+* **Database** - Contains information about the status of tasks, DAGs,
Variables, connections, etc.
+* **Queue** - Queue mechanism provided by Celery
+
+Please note that the queue at Celery consists of two components:
+
+* **Broker** - Stores commands for execution
+* **Result backend** - Stores status of completed command
+
+The components communicate with each other in many places
+
+* [1] **Scheduler** --> **Workers** - Fetchs task execution logs
Review comment:
I checked it out. This is true. Webserver fetches logs.
https://github.com/apache/airflow/blob/d719e1f/airflow/www/views.py#L554-L572
https://github.com/apache/airflow/blob/d719e1fd6705a93a0dfefef4b46478ade5e006ea/airflow/utils/log/file_task_handler.py#L110-L132
I updated tihis PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services