This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new b35b08ec41 Improve pre-commit to generate Airflow diagrams as a code
(#36333)
b35b08ec41 is described below
commit b35b08ec41814b6fe5d7388296db83a726e6d6d0
Author: Jarek Potiuk <[email protected]>
AuthorDate: Wed Dec 20 19:44:33 2023 +0100
Improve pre-commit to generate Airflow diagrams as a code (#36333)
Since we are getting more diagrams generated in Airflow using the
"diagram as a code" approach, this PR improves the pre-commit to be
more suitable to support generation of more of the images coming
from different sources, placed in different directories and generated
independently, so that the whole process is more distributed and easy
for whoever creates diagrams to add their own diagram.
The changes implemented in this PR:
* the code to generate the diagrams is now next to the diagram they
generate. It has the same name as the diagram, but it has the .py
extension. This way it is immediately visible where is the source
of each diagram (right next to each diagram)
* each of the .py diagram Python files is runnable on its own. This
way you can easily regenerate the diagrams by running corresponding
Python file or even automate it by running "save" action and generate
the diagrams automatically by running the Python code every time
the file is saved. That makes a very nice workflow on iterating on
each diagram, independently from each othere
* the pre-commit script is given a set of folders which should be
scanned and it finds and run the diagrams on pre-commmit. It also
creates and verifies the md5sum hash of the source Python file
separately for each diagram and only runs diagram generation when
the source file changed vs. last time the hash was saved and
committed. The hash sum is stored next to the image and sources
with .md5sum extension
Also updated documentation in the CONTRIBUTING.rst explaining how
to generate the diagrams and what is the mechanism of that
generation.
---
.pre-commit-config.yaml | 4 +-
.rat-excludes | 3 +
CONTRIBUTING.rst | 45 +++++
...am_fab_auth_manager_airflow_architecture.md5sum | 1 +
...agram_fab_auth_manager_airflow_architecture.png | Bin 81823 -> 81735 bytes
...iagram_fab_auth_manager_airflow_architecture.py | 74 +++++++
...iagram_auth_manager_airflow_architecture.md5sum | 1 +
.../diagram_auth_manager_airflow_architecture.png | Bin 53958 -> 54220 bytes
.../diagram_auth_manager_airflow_architecture.py | 73 +++++++
.../img/diagram_basic_airflow_architecture.md5sum | 1 +
.../img/diagram_basic_airflow_architecture.png | Bin 100899 -> 87096 bytes
.../img/diagram_basic_airflow_architecture.py | 77 ++++++++
...agram_dag_processor_airflow_architecture.md5sum | 1 +
.../diagram_dag_processor_airflow_architecture.png | Bin 121666 -> 106642 bytes
.../diagram_dag_processor_airflow_architecture.py | 84 ++++++++
...agram_fab_auth_manager_airflow_architecture.png | Bin 0 -> 49545 bytes
.../diagrams/python_multiprocess_logo.png | Bin
.../pre_commit_generate_airflow_diagrams.py | 217 ++-------------------
18 files changed, 381 insertions(+), 200 deletions(-)
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index dfed80447b..18c1d7d64c 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -416,8 +416,8 @@ repos:
name: Generate airflow diagrams
entry: ./scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py
language: python
- files: ^scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py
- pass_filenames: false
+ files: ^docs/.*/diagram_[^/]*\.py$
+ pass_filenames: true
additional_dependencies: ['rich>=12.4.4', "diagrams>=0.23.4"]
- id: update-supported-versions
name: Updates supported versions in documentation
diff --git a/.rat-excludes b/.rat-excludes
index 751742b1af..d881787de9 100644
--- a/.rat-excludes
+++ b/.rat-excludes
@@ -145,3 +145,6 @@ doap_airflow.rdf
# PKG-INFO file
PKG-INFO
+
+# checksum files
+.*\.md5sum
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
index 818ca6120d..3640b1eb31 100644
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
@@ -981,6 +981,51 @@ Documentation for ``apache-airflow`` package and other
packages that are closely
providers packages are in ``/docs/`` directory. For detailed information on
documentation development,
see: `docs/README.rst <docs/README.rst>`_
+Diagrams
+========
+
+We started to use (and gradually convert old diagrams to use it) `Diagrams
<https://diagrams.mingrammer.com/>`_
+as our tool of choice to generate diagrams. The diagrams are generated from
Python code and can be
+automatically updated when the code changes. The diagrams are generated using
pre-commit hooks (See
+static checks below) but they can also be generated manually by running the
corresponding Python code.
+
+To run the code you need to install the dependencies in the virtualenv you use
to run it:
+* ``pip install diagrams rich``. You need to have graphviz installed in your
+system (``brew install graphviz`` on macOS for example).
+
+The source code of the diagrams are next to the generated diagram, the
difference is that the source
+code has ``.py`` extension and the generated diagram has ``.png`` extension.
The pre-commit hook
+ ``generate-airflow-diagrams`` will look for ``diagram_*.py`` files in the
``docs`` subdirectories
+to find them and runs them when the sources changed and the diagrams are not
up to date (the
+pre-commit will automatically generate an .md5sum hash of the sources and
store it next to the diagram
+file).
+
+In order to generate the diagram manually you can run the following command:
+
+.. code-block:: bash
+
+ python <path-to-diagram-file>.py
+
+You can also generate all diagrams by:
+
+.. code-block:: bash
+
+ pre-commit run generate-airflow-diagrams
+
+or with Breeze:
+
+.. code-block:: bash
+
+ breeze static-checks --type generate-airflow-diagrams --all-files
+
+When you iterate over a diagram, you can also setup a "save" action in your
IDE to run the python
+file automatically when you save the diagram file.
+
+Once you've done iteration and you are happy with the diagram, you can commit
the diagram, the source
+code and the .md5sum file. The pre-commit hook will then not run the diagram
generation until the
+source code for it changes.
+
+
Static code checks
==================
diff --git
a/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.md5sum
b/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.md5sum
new file mode 100644
index 0000000000..fb928aa691
--- /dev/null
+++
b/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.md5sum
@@ -0,0 +1 @@
+aa73a8292341145e0f60682f7047503b
diff --git
a/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.png
b/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.png
index 4299bb28d2..9c7a1d1561 100644
Binary files
a/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.png
and
b/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.png
differ
diff --git
a/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.py
b/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.py
new file mode 100644
index 0000000000..393d988bb2
--- /dev/null
+++
b/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.py
@@ -0,0 +1,74 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from pathlib import Path
+
+from diagrams import Cluster, Diagram, Edge
+from diagrams.custom import Custom
+from diagrams.onprem.client import User
+from diagrams.onprem.database import PostgreSQL
+from rich.console import Console
+
+MY_DIR = Path(__file__).parent
+MY_FILENAME = Path(__file__).with_suffix("").name
+PYTHON_MULTIPROCESS_LOGO = MY_DIR.parents[1] / "diagrams" /
"python_multiprocess_logo.png"
+
+console = Console(width=400, color_system="standard")
+
+
+def generate_fab_auth_manager_airflow_diagram():
+ image_file = (MY_DIR / MY_FILENAME).with_suffix(".png")
+ console.print(f"[bright_blue]Generating architecture image {image_file}")
+ with Diagram(
+ name="",
+ show=False,
+ direction="LR",
+ curvestyle="ortho",
+ filename=MY_FILENAME,
+ ):
+ user = User("User")
+ with Cluster("Airflow environment"):
+ webserver = Custom("Webserver(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+
+ with Cluster("FAB provider"):
+ fab_auth_manager = Custom("FAB auth manager",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+ with Cluster("Core Airflow"):
+ auth_manager_interface = Custom(
+ "Auth manager\ninterface",
PYTHON_MULTIPROCESS_LOGO.as_posix()
+ )
+
+ db = PostgreSQL("Metadata DB")
+
+ user >> Edge(color="black", style="solid", reverse=True, label="Access
to the console") >> webserver
+ (
+ webserver
+ >> Edge(color="black", style="solid", reverse=True, label="Is user
authorized?")
+ >> fab_auth_manager
+ )
+ (fab_auth_manager >> Edge(color="black", style="solid", reverse=True)
>> db)
+ (
+ fab_auth_manager
+ >> Edge(color="black", style="dotted", reverse=False,
label="Inherit")
+ >> auth_manager_interface
+ )
+
+ console.print(f"[green]Generating architecture image {image_file}")
+
+
+if __name__ == "__main__":
+ generate_fab_auth_manager_airflow_diagram()
diff --git
a/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.md5sum
b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.md5sum
new file mode 100644
index 0000000000..ac3e24d848
--- /dev/null
+++ b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.md5sum
@@ -0,0 +1 @@
+5b82cba489898a46dcfe5f458eeee33b
diff --git
a/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.png
b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.png
index ba6cfaef61..35f3f418f2 100644
Binary files
a/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.png and
b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.png differ
diff --git
a/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.py
b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.py
new file mode 100644
index 0000000000..453d17267c
--- /dev/null
+++ b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.py
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from pathlib import Path
+
+from diagrams import Cluster, Diagram, Edge
+from diagrams.custom import Custom
+from diagrams.onprem.client import User
+from rich.console import Console
+
+MY_DIR = Path(__file__).parent
+MY_FILENAME = Path(__file__).with_suffix("").name
+PYTHON_MULTIPROCESS_LOGO = MY_DIR.parents[1] / "diagrams" /
"python_multiprocess_logo.png"
+
+console = Console(width=400, color_system="standard")
+
+
+def generate_auth_manager_airflow_diagram():
+ image_file = (MY_DIR / MY_FILENAME).with_suffix(".png")
+
+ console.print(f"[bright_blue]Generating architecture image {image_file}")
+ with Diagram(
+ name="",
+ show=False,
+ direction="LR",
+ curvestyle="ortho",
+ filename=MY_FILENAME,
+ ):
+ user = User("User")
+ with Cluster("Airflow environment"):
+ webserver = Custom("Webserver(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+
+ with Cluster("Provider X"):
+ auth_manager = Custom("X auth manager",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+ with Cluster("Core Airflow"):
+ auth_manager_interface = Custom(
+ "Auth manager\ninterface",
PYTHON_MULTIPROCESS_LOGO.as_posix()
+ )
+
+ (user >> Edge(color="black", style="solid", reverse=True,
label="Access to the console") >> webserver)
+
+ (
+ webserver
+ >> Edge(color="black", style="solid", reverse=True, label="Is user
authorized?")
+ >> auth_manager
+ )
+
+ (
+ auth_manager
+ >> Edge(color="black", style="dotted", reverse=False,
label="Inherit")
+ >> auth_manager_interface
+ )
+
+ console.print(f"[green]Generating architecture image {image_file}")
+
+
+if __name__ == "__main__":
+ generate_auth_manager_airflow_diagram()
diff --git a/docs/apache-airflow/img/diagram_basic_airflow_architecture.md5sum
b/docs/apache-airflow/img/diagram_basic_airflow_architecture.md5sum
new file mode 100644
index 0000000000..d20c0307d4
--- /dev/null
+++ b/docs/apache-airflow/img/diagram_basic_airflow_architecture.md5sum
@@ -0,0 +1 @@
+ac9bd11824e7faf5ed5232ff242c3157
diff --git a/docs/apache-airflow/img/diagram_basic_airflow_architecture.png
b/docs/apache-airflow/img/diagram_basic_airflow_architecture.png
index 51f571e0e8..feae0a63bb 100644
Binary files a/docs/apache-airflow/img/diagram_basic_airflow_architecture.png
and b/docs/apache-airflow/img/diagram_basic_airflow_architecture.png differ
diff --git a/docs/apache-airflow/img/diagram_basic_airflow_architecture.py
b/docs/apache-airflow/img/diagram_basic_airflow_architecture.py
new file mode 100644
index 0000000000..d65a6ae83a
--- /dev/null
+++ b/docs/apache-airflow/img/diagram_basic_airflow_architecture.py
@@ -0,0 +1,77 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from pathlib import Path
+
+from diagrams import Cluster, Diagram, Edge
+from diagrams.custom import Custom
+from diagrams.onprem.client import User
+from diagrams.onprem.database import PostgreSQL
+from diagrams.programming.flowchart import MultipleDocuments
+from rich.console import Console
+
+MY_DIR = Path(__file__).parent
+MY_FILENAME = Path(__file__).with_suffix("").name
+PYTHON_MULTIPROCESS_LOGO = MY_DIR.parents[1] / "diagrams" /
"python_multiprocess_logo.png"
+
+console = Console(width=400, color_system="standard")
+
+
+def generate_basic_airflow_diagram():
+ image_file = (MY_DIR / MY_FILENAME).with_suffix(".png")
+
+ console.print(f"[bright_blue]Generating architecture image {image_file}")
+ with Diagram(
+ name="", show=False, direction="LR", curvestyle="ortho",
filename=MY_FILENAME, outformat="png"
+ ):
+ with Cluster("Parsing & Scheduling"):
+ schedulers = Custom("Scheduler(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+
+ metadata_db = PostgreSQL("Metadata DB")
+
+ dag_author = User("DAG Author")
+ dag_files = MultipleDocuments("DAG files")
+
+ dag_author >> Edge(color="black", style="dashed", reverse=False) >>
dag_files
+
+ with Cluster("Execution"):
+ workers = Custom("Worker(s)", PYTHON_MULTIPROCESS_LOGO.as_posix())
+ triggerer = Custom("Triggerer(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+
+ schedulers - Edge(color="blue", style="dashed", taillabel="Executor")
- workers
+
+ schedulers >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
+ workers >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
+ triggerer >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
+
+ operations_user = User("Operations User")
+ with Cluster("UI"):
+ webservers = Custom("Webserver(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+
+ webservers >> Edge(color="black", style="dashed", reverse=True) >>
operations_user
+
+ metadata_db >> Edge(color="red", style="dotted", reverse=True) >>
webservers
+
+ dag_files >> Edge(color="brown", style="solid") >> workers
+ dag_files >> Edge(color="brown", style="solid") >> schedulers
+ dag_files >> Edge(color="brown", style="solid") >> triggerer
+ console.print(f"[green]Generating architecture image {image_file}")
+
+
+if __name__ == "__main__":
+ generate_basic_airflow_diagram()
diff --git
a/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.md5sum
b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.md5sum
new file mode 100644
index 0000000000..ebe1a15d56
--- /dev/null
+++ b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.md5sum
@@ -0,0 +1 @@
+e189c45f79a7a878802bde13be27a112
diff --git
a/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.png
b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.png
index f44eaa35ec..8a2d48df19 100644
Binary files
a/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.png and
b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.png differ
diff --git
a/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.py
b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.py
new file mode 100644
index 0000000000..714049d349
--- /dev/null
+++ b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.py
@@ -0,0 +1,84 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from pathlib import Path
+
+from diagrams import Cluster, Diagram, Edge
+from diagrams.custom import Custom
+from diagrams.onprem.client import User
+from diagrams.onprem.database import PostgreSQL
+from diagrams.programming.flowchart import MultipleDocuments
+from rich.console import Console
+
+MY_DIR = Path(__file__).parent
+MY_FILENAME = Path(__file__).with_suffix("").name
+PYTHON_MULTIPROCESS_LOGO = MY_DIR.parents[1] / "diagrams" /
"python_multiprocess_logo.png"
+
+console = Console(width=400, color_system="standard")
+
+
+def generate_dag_processor_airflow_diagram():
+ dag_processor_architecture_image_file = (MY_DIR /
MY_FILENAME).with_suffix(".png")
+ console.print(f"[bright_blue]Generating architecture image
{dag_processor_architecture_image_file}")
+ with Diagram(
+ name="",
+ show=False,
+ direction="LR",
+ curvestyle="ortho",
+ filename=MY_FILENAME,
+ outformat="png",
+ ):
+ operations_user = User("Operations User")
+ with Cluster("No DAG Python Code Execution", graph_attr={"bgcolor":
"lightgrey"}):
+ with Cluster("Scheduling"):
+ schedulers = Custom("Scheduler(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+
+ with Cluster("UI"):
+ webservers = Custom("Webserver(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+
+ webservers >> Edge(color="black", style="dashed", reverse=True) >>
operations_user
+
+ metadata_db = PostgreSQL("Metadata DB")
+
+ dag_author = User("DAG Author")
+ with Cluster("DAG Python Code Execution"):
+ with Cluster("Execution"):
+ workers = Custom("Worker(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+ triggerer = Custom("Triggerer(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+ with Cluster("Parsing"):
+ dag_processors = Custom("DAG\nProcessor(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
+ dag_files = MultipleDocuments("DAG files")
+
+ dag_author >> Edge(color="black", style="dashed", reverse=False) >>
dag_files
+
+ workers - Edge(color="blue", style="dashed", headlabel="Executor") -
schedulers
+
+ metadata_db >> Edge(color="red", style="dotted", reverse=True) >>
webservers
+ metadata_db >> Edge(color="red", style="dotted", reverse=True) >>
schedulers
+ dag_processors >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
+ workers >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
+ triggerer >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
+
+ dag_files >> Edge(color="brown", style="solid") >> workers
+ dag_files >> Edge(color="brown", style="solid") >> dag_processors
+ dag_files >> Edge(color="brown", style="solid") >> triggerer
+ console.print(f"[green]Generating architecture image
{dag_processor_architecture_image_file}")
+
+
+if __name__ == "__main__":
+ generate_dag_processor_airflow_diagram()
diff --git
a/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.png
b/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.png
new file mode 100644
index 0000000000..4057a67615
Binary files /dev/null and
b/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.png
differ
diff --git a/images/diagrams/python_multiprocess_logo.png
b/docs/diagrams/python_multiprocess_logo.png
similarity index 100%
rename from images/diagrams/python_multiprocess_logo.png
rename to docs/diagrams/python_multiprocess_logo.png
diff --git a/scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py
b/scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py
index 0afb9b9bf5..f809d566e3 100755
--- a/scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py
+++ b/scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py
@@ -18,217 +18,38 @@
from __future__ import annotations
import hashlib
-import os
+import subprocess
+import sys
from pathlib import Path
-from diagrams import Cluster, Diagram, Edge
-from diagrams.custom import Custom
-from diagrams.onprem.client import User
-from diagrams.onprem.database import PostgreSQL
-from diagrams.programming.flowchart import MultipleDocuments
from rich.console import Console
console = Console(width=400, color_system="standard")
LOCAL_DIR = Path(__file__).parent
AIRFLOW_SOURCES_ROOT = Path(__file__).parents[3]
-DOCS_IMAGES_DIR = AIRFLOW_SOURCES_ROOT / "docs" / "apache-airflow" / "img"
-FAB_PROVIDER_DOCS_IMAGES_DIR = AIRFLOW_SOURCES_ROOT / "docs" /
"apache-airflow-providers-fab" / "img"
-PYTHON_MULTIPROCESS_LOGO = AIRFLOW_SOURCES_ROOT / "images" / "diagrams" /
"python_multiprocess_logo.png"
-BASIC_ARCHITECTURE_IMAGE_NAME = "diagram_basic_airflow_architecture"
-DAG_PROCESSOR_AIRFLOW_ARCHITECTURE_IMAGE_NAME =
"diagram_dag_processor_airflow_architecture"
-AUTH_MANAGER_AIRFLOW_ARCHITECTURE_IMAGE_NAME =
"diagram_auth_manager_airflow_architecture"
-FAB_AUTH_MANAGER_AIRFLOW_ARCHITECTURE_IMAGE_NAME =
"diagram_fab_auth_manager_airflow_architecture"
-DIAGRAM_HASH_FILE_NAME = "diagram_hash.txt"
-
-def generate_basic_airflow_diagram():
- basic_architecture_image_file = (DOCS_IMAGES_DIR /
BASIC_ARCHITECTURE_IMAGE_NAME).with_suffix(".png")
- console.print(f"[bright_blue]Generating architecture image
{basic_architecture_image_file}")
- with Diagram(
- name="", show=False, direction="LR", curvestyle="ortho",
filename=BASIC_ARCHITECTURE_IMAGE_NAME
- ):
- with Cluster("Parsing & Scheduling"):
- schedulers = Custom("Scheduler(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
-
- metadata_db = PostgreSQL("Metadata DB")
-
- dag_author = User("DAG Author")
- dag_files = MultipleDocuments("DAG files")
-
- dag_author >> Edge(color="black", style="dashed", reverse=False) >>
dag_files
-
- with Cluster("Execution"):
- workers = Custom("Worker(s)", PYTHON_MULTIPROCESS_LOGO.as_posix())
- triggerer = Custom("Triggerer(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
-
- schedulers - Edge(color="blue", style="dashed", taillabel="Executor")
- workers
-
- schedulers >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
- workers >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
- triggerer >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
-
- operations_user = User("Operations User")
- with Cluster("UI"):
- webservers = Custom("Webserver(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
-
- webservers >> Edge(color="black", style="dashed", reverse=True) >>
operations_user
-
- metadata_db >> Edge(color="red", style="dotted", reverse=True) >>
webservers
-
- dag_files >> Edge(color="brown", style="solid") >> workers
- dag_files >> Edge(color="brown", style="solid") >> schedulers
- dag_files >> Edge(color="brown", style="solid") >> triggerer
- console.print(f"[green]Generating architecture image
{basic_architecture_image_file}")
-
-
-def generate_dag_processor_airflow_diagram():
- dag_processor_architecture_image_file = (
- DOCS_IMAGES_DIR / DAG_PROCESSOR_AIRFLOW_ARCHITECTURE_IMAGE_NAME
- ).with_suffix(".png")
- console.print(f"[bright_blue]Generating architecture image
{dag_processor_architecture_image_file}")
- with Diagram(
- name="",
- show=False,
- direction="LR",
- curvestyle="ortho",
- filename=DAG_PROCESSOR_AIRFLOW_ARCHITECTURE_IMAGE_NAME,
- ):
- operations_user = User("Operations User")
- with Cluster("No DAG Python Code Execution", graph_attr={"bgcolor":
"lightgrey"}):
- with Cluster("Scheduling"):
- schedulers = Custom("Scheduler(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
-
- with Cluster("UI"):
- webservers = Custom("Webserver(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
-
- webservers >> Edge(color="black", style="dashed", reverse=True) >>
operations_user
-
- metadata_db = PostgreSQL("Metadata DB")
-
- dag_author = User("DAG Author")
- with Cluster("DAG Python Code Execution"):
- with Cluster("Execution"):
- workers = Custom("Worker(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
- triggerer = Custom("Triggerer(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
- with Cluster("Parsing"):
- dag_processors = Custom("DAG\nProcessor(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
- dag_files = MultipleDocuments("DAG files")
-
- dag_author >> Edge(color="black", style="dashed", reverse=False) >>
dag_files
-
- workers - Edge(color="blue", style="dashed", headlabel="Executor") -
schedulers
-
- metadata_db >> Edge(color="red", style="dotted", reverse=True) >>
webservers
- metadata_db >> Edge(color="red", style="dotted", reverse=True) >>
schedulers
- dag_processors >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
- workers >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
- triggerer >> Edge(color="red", style="dotted", reverse=True) >>
metadata_db
-
- dag_files >> Edge(color="brown", style="solid") >> workers
- dag_files >> Edge(color="brown", style="solid") >> dag_processors
- dag_files >> Edge(color="brown", style="solid") >> triggerer
- console.print(f"[green]Generating architecture image
{dag_processor_architecture_image_file}")
-
-
-def generate_auth_manager_airflow_diagram():
- auth_manager_architecture_image_file = (
- DOCS_IMAGES_DIR / AUTH_MANAGER_AIRFLOW_ARCHITECTURE_IMAGE_NAME
- ).with_suffix(".png")
- console.print(f"[bright_blue]Generating architecture image
{auth_manager_architecture_image_file}")
- with Diagram(
- name="",
- show=False,
- direction="LR",
- curvestyle="ortho",
- filename=AUTH_MANAGER_AIRFLOW_ARCHITECTURE_IMAGE_NAME,
- ):
- user = User("User")
- with Cluster("Airflow environment"):
- webserver = Custom("Webserver(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
-
- with Cluster("Provider X"):
- auth_manager = Custom("X auth manager",
PYTHON_MULTIPROCESS_LOGO.as_posix())
- with Cluster("Core Airflow"):
- auth_manager_interface = Custom(
- "Auth manager\ninterface",
PYTHON_MULTIPROCESS_LOGO.as_posix()
- )
-
- (user >> Edge(color="black", style="solid", reverse=True,
label="Access to the console") >> webserver)
-
- (
- webserver
- >> Edge(color="black", style="solid", reverse=True, label="Is user
authorized?")
- >> auth_manager
- )
-
- (
- auth_manager
- >> Edge(color="black", style="dotted", reverse=False,
label="Inherit")
- >> auth_manager_interface
- )
-
- console.print(f"[green]Generating architecture image
{auth_manager_architecture_image_file}")
-
-
-def generate_fab_auth_manager_airflow_diagram():
- auth_manager_architecture_image_file = (
- FAB_PROVIDER_DOCS_IMAGES_DIR /
FAB_AUTH_MANAGER_AIRFLOW_ARCHITECTURE_IMAGE_NAME
- ).with_suffix(".png")
- console.print(f"[bright_blue]Generating architecture image
{auth_manager_architecture_image_file}")
- with Diagram(
- name="",
- show=False,
- direction="LR",
- curvestyle="ortho",
- filename=FAB_AUTH_MANAGER_AIRFLOW_ARCHITECTURE_IMAGE_NAME,
- ):
- user = User("User")
- with Cluster("Airflow environment"):
- webserver = Custom("Webserver(s)",
PYTHON_MULTIPROCESS_LOGO.as_posix())
-
- with Cluster("FAB provider"):
- fab_auth_manager = Custom("FAB auth manager",
PYTHON_MULTIPROCESS_LOGO.as_posix())
- with Cluster("Core Airflow"):
- auth_manager_interface = Custom(
- "Auth manager\ninterface",
PYTHON_MULTIPROCESS_LOGO.as_posix()
- )
-
- db = PostgreSQL("Metadata DB")
-
- user >> Edge(color="black", style="solid", reverse=True, label="Access
to the console") >> webserver
- (
- webserver
- >> Edge(color="black", style="solid", reverse=True, label="Is user
authorized?")
- >> fab_auth_manager
- )
- (fab_auth_manager >> Edge(color="black", style="solid", reverse=True)
>> db)
- (
- fab_auth_manager
- >> Edge(color="black", style="dotted", reverse=False,
label="Inherit")
- >> auth_manager_interface
- )
-
- console.print(f"[green]Generating architecture image
{auth_manager_architecture_image_file}")
+def _get_file_hash(file_to_check: Path) -> str:
+ hash_md5 = hashlib.md5()
+ hash_md5.update(Path(file_to_check).resolve().read_bytes())
+ return hash_md5.hexdigest()
def main():
- hash_md5 = hashlib.md5()
- hash_md5.update(Path(__file__).resolve().read_bytes())
- my_file_hash = hash_md5.hexdigest()
- hash_file = LOCAL_DIR / DIAGRAM_HASH_FILE_NAME
- if not hash_file.exists() or not hash_file.read_text().strip() ==
str(my_file_hash).strip():
- os.chdir(DOCS_IMAGES_DIR)
- generate_basic_airflow_diagram()
- generate_dag_processor_airflow_diagram()
- generate_auth_manager_airflow_diagram()
- os.chdir(FAB_PROVIDER_DOCS_IMAGES_DIR)
- generate_fab_auth_manager_airflow_diagram()
- os.chdir(DOCS_IMAGES_DIR)
- hash_file.write_text(str(my_file_hash) + "\n")
- else:
- console.print("[bright_blue]No changes to generation script. Not
regenerating the images.")
+ # get all files as arguments
+ for arg in sys.argv[1:]:
+ source_file = Path(arg).resolve()
+ checksum = _get_file_hash(source_file)
+ hash_file = source_file.with_suffix(".md5sum")
+ if not hash_file.exists() or not hash_file.read_text().strip() ==
str(checksum).strip():
+ console.print(f"[bright_blue]Changes in {source_file}.
Regenerating the image.")
+ subprocess.run(
+ [sys.executable, source_file.resolve().as_posix()],
check=True, cwd=source_file.parent
+ )
+ hash_file.write_text(str(checksum) + "\n")
+ else:
+ console.print(f"[bright_blue]No changes in {source_file}. Not
regenerating the image.")
if __name__ == "__main__":