This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch v2-8-test in repository https://gitbox.apache.org/repos/asf/airflow.git
commit ae0ceb382dc0dfb91adc5d22b480c321ffbe3fe0 Author: Jarek Potiuk <[email protected]> AuthorDate: Wed Dec 20 19:44:33 2023 +0100 Improve pre-commit to generate Airflow diagrams as a code (#36333) Since we are getting more diagrams generated in Airflow using the "diagram as a code" approach, this PR improves the pre-commit to be more suitable to support generation of more of the images coming from different sources, placed in different directories and generated independently, so that the whole process is more distributed and easy for whoever creates diagrams to add their own diagram. The changes implemented in this PR: * the code to generate the diagrams is now next to the diagram they generate. It has the same name as the diagram, but it has the .py extension. This way it is immediately visible where is the source of each diagram (right next to each diagram) * each of the .py diagram Python files is runnable on its own. This way you can easily regenerate the diagrams by running corresponding Python file or even automate it by running "save" action and generate the diagrams automatically by running the Python code every time the file is saved. That makes a very nice workflow on iterating on each diagram, independently from each othere * the pre-commit script is given a set of folders which should be scanned and it finds and run the diagrams on pre-commmit. It also creates and verifies the md5sum hash of the source Python file separately for each diagram and only runs diagram generation when the source file changed vs. last time the hash was saved and committed. The hash sum is stored next to the image and sources with .md5sum extension Also updated documentation in the CONTRIBUTING.rst explaining how to generate the diagrams and what is the mechanism of that generation. (cherry picked from commit b35b08ec41814b6fe5d7388296db83a726e6d6d0) --- .pre-commit-config.yaml | 4 +- .rat-excludes | 3 + CONTRIBUTING.rst | 45 ++++++++ ...agram_fab_auth_manager_airflow_architecture.png | Bin 0 -> 81735 bytes ...iagram_auth_manager_airflow_architecture.md5sum | 1 + .../diagram_auth_manager_airflow_architecture.png | Bin 0 -> 54220 bytes .../diagram_auth_manager_airflow_architecture.py | 73 +++++++++++++ .../img/diagram_basic_airflow_architecture.md5sum | 1 + .../img/diagram_basic_airflow_architecture.py | 77 +++++++++++++ ...agram_dag_processor_airflow_architecture.md5sum | 1 + .../diagram_dag_processor_airflow_architecture.py | 84 ++++++++++++++ ...am_fab_auth_manager_airflow_architecture.md5sum | 1 + ...agram_fab_auth_manager_airflow_architecture.png | Bin 0 -> 49545 bytes ...iagram_fab_auth_manager_airflow_architecture.py | 74 +++++++++++++ .../diagrams/python_multiprocess_logo.png | Bin .../pre_commit_generate_airflow_diagrams.py | 121 ++++----------------- 16 files changed, 381 insertions(+), 104 deletions(-) diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index c12be3a5f1..7a7b2a64e5 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -416,8 +416,8 @@ repos: name: Generate airflow diagrams entry: ./scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py language: python - files: ^scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py - pass_filenames: false + files: ^docs/.*/diagram_[^/]*\.py$ + pass_filenames: true additional_dependencies: ['rich>=12.4.4', "diagrams>=0.23.4"] - id: update-supported-versions name: Updates supported versions in documentation diff --git a/.rat-excludes b/.rat-excludes index 751742b1af..d881787de9 100644 --- a/.rat-excludes +++ b/.rat-excludes @@ -145,3 +145,6 @@ doap_airflow.rdf # PKG-INFO file PKG-INFO + +# checksum files +.*\.md5sum diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index c015e457bd..abb9ed59da 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -942,6 +942,51 @@ Documentation for ``apache-airflow`` package and other packages that are closely providers packages are in ``/docs/`` directory. For detailed information on documentation development, see: `docs/README.rst <docs/README.rst>`_ +Diagrams +======== + +We started to use (and gradually convert old diagrams to use it) `Diagrams <https://diagrams.mingrammer.com/>`_ +as our tool of choice to generate diagrams. The diagrams are generated from Python code and can be +automatically updated when the code changes. The diagrams are generated using pre-commit hooks (See +static checks below) but they can also be generated manually by running the corresponding Python code. + +To run the code you need to install the dependencies in the virtualenv you use to run it: +* ``pip install diagrams rich``. You need to have graphviz installed in your +system (``brew install graphviz`` on macOS for example). + +The source code of the diagrams are next to the generated diagram, the difference is that the source +code has ``.py`` extension and the generated diagram has ``.png`` extension. The pre-commit hook + ``generate-airflow-diagrams`` will look for ``diagram_*.py`` files in the ``docs`` subdirectories +to find them and runs them when the sources changed and the diagrams are not up to date (the +pre-commit will automatically generate an .md5sum hash of the sources and store it next to the diagram +file). + +In order to generate the diagram manually you can run the following command: + +.. code-block:: bash + + python <path-to-diagram-file>.py + +You can also generate all diagrams by: + +.. code-block:: bash + + pre-commit run generate-airflow-diagrams + +or with Breeze: + +.. code-block:: bash + + breeze static-checks --type generate-airflow-diagrams --all-files + +When you iterate over a diagram, you can also setup a "save" action in your IDE to run the python +file automatically when you save the diagram file. + +Once you've done iteration and you are happy with the diagram, you can commit the diagram, the source +code and the .md5sum file. The pre-commit hook will then not run the diagram generation until the +source code for it changes. + + Static code checks ================== diff --git a/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.png b/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.png new file mode 100644 index 0000000000..9c7a1d1561 Binary files /dev/null and b/docs/apache-airflow-providers-fab/img/diagram_fab_auth_manager_airflow_architecture.png differ diff --git a/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.md5sum b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.md5sum new file mode 100644 index 0000000000..ac3e24d848 --- /dev/null +++ b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.md5sum @@ -0,0 +1 @@ +5b82cba489898a46dcfe5f458eeee33b diff --git a/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.png b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.png new file mode 100644 index 0000000000..35f3f418f2 Binary files /dev/null and b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.png differ diff --git a/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.py b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.py new file mode 100644 index 0000000000..453d17267c --- /dev/null +++ b/docs/apache-airflow/img/diagram_auth_manager_airflow_architecture.py @@ -0,0 +1,73 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations + +from pathlib import Path + +from diagrams import Cluster, Diagram, Edge +from diagrams.custom import Custom +from diagrams.onprem.client import User +from rich.console import Console + +MY_DIR = Path(__file__).parent +MY_FILENAME = Path(__file__).with_suffix("").name +PYTHON_MULTIPROCESS_LOGO = MY_DIR.parents[1] / "diagrams" / "python_multiprocess_logo.png" + +console = Console(width=400, color_system="standard") + + +def generate_auth_manager_airflow_diagram(): + image_file = (MY_DIR / MY_FILENAME).with_suffix(".png") + + console.print(f"[bright_blue]Generating architecture image {image_file}") + with Diagram( + name="", + show=False, + direction="LR", + curvestyle="ortho", + filename=MY_FILENAME, + ): + user = User("User") + with Cluster("Airflow environment"): + webserver = Custom("Webserver(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + + with Cluster("Provider X"): + auth_manager = Custom("X auth manager", PYTHON_MULTIPROCESS_LOGO.as_posix()) + with Cluster("Core Airflow"): + auth_manager_interface = Custom( + "Auth manager\ninterface", PYTHON_MULTIPROCESS_LOGO.as_posix() + ) + + (user >> Edge(color="black", style="solid", reverse=True, label="Access to the console") >> webserver) + + ( + webserver + >> Edge(color="black", style="solid", reverse=True, label="Is user authorized?") + >> auth_manager + ) + + ( + auth_manager + >> Edge(color="black", style="dotted", reverse=False, label="Inherit") + >> auth_manager_interface + ) + + console.print(f"[green]Generating architecture image {image_file}") + + +if __name__ == "__main__": + generate_auth_manager_airflow_diagram() diff --git a/docs/apache-airflow/img/diagram_basic_airflow_architecture.md5sum b/docs/apache-airflow/img/diagram_basic_airflow_architecture.md5sum new file mode 100644 index 0000000000..d20c0307d4 --- /dev/null +++ b/docs/apache-airflow/img/diagram_basic_airflow_architecture.md5sum @@ -0,0 +1 @@ +ac9bd11824e7faf5ed5232ff242c3157 diff --git a/docs/apache-airflow/img/diagram_basic_airflow_architecture.py b/docs/apache-airflow/img/diagram_basic_airflow_architecture.py new file mode 100644 index 0000000000..d65a6ae83a --- /dev/null +++ b/docs/apache-airflow/img/diagram_basic_airflow_architecture.py @@ -0,0 +1,77 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations + +from pathlib import Path + +from diagrams import Cluster, Diagram, Edge +from diagrams.custom import Custom +from diagrams.onprem.client import User +from diagrams.onprem.database import PostgreSQL +from diagrams.programming.flowchart import MultipleDocuments +from rich.console import Console + +MY_DIR = Path(__file__).parent +MY_FILENAME = Path(__file__).with_suffix("").name +PYTHON_MULTIPROCESS_LOGO = MY_DIR.parents[1] / "diagrams" / "python_multiprocess_logo.png" + +console = Console(width=400, color_system="standard") + + +def generate_basic_airflow_diagram(): + image_file = (MY_DIR / MY_FILENAME).with_suffix(".png") + + console.print(f"[bright_blue]Generating architecture image {image_file}") + with Diagram( + name="", show=False, direction="LR", curvestyle="ortho", filename=MY_FILENAME, outformat="png" + ): + with Cluster("Parsing & Scheduling"): + schedulers = Custom("Scheduler(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + + metadata_db = PostgreSQL("Metadata DB") + + dag_author = User("DAG Author") + dag_files = MultipleDocuments("DAG files") + + dag_author >> Edge(color="black", style="dashed", reverse=False) >> dag_files + + with Cluster("Execution"): + workers = Custom("Worker(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + triggerer = Custom("Triggerer(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + + schedulers - Edge(color="blue", style="dashed", taillabel="Executor") - workers + + schedulers >> Edge(color="red", style="dotted", reverse=True) >> metadata_db + workers >> Edge(color="red", style="dotted", reverse=True) >> metadata_db + triggerer >> Edge(color="red", style="dotted", reverse=True) >> metadata_db + + operations_user = User("Operations User") + with Cluster("UI"): + webservers = Custom("Webserver(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + + webservers >> Edge(color="black", style="dashed", reverse=True) >> operations_user + + metadata_db >> Edge(color="red", style="dotted", reverse=True) >> webservers + + dag_files >> Edge(color="brown", style="solid") >> workers + dag_files >> Edge(color="brown", style="solid") >> schedulers + dag_files >> Edge(color="brown", style="solid") >> triggerer + console.print(f"[green]Generating architecture image {image_file}") + + +if __name__ == "__main__": + generate_basic_airflow_diagram() diff --git a/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.md5sum b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.md5sum new file mode 100644 index 0000000000..ebe1a15d56 --- /dev/null +++ b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.md5sum @@ -0,0 +1 @@ +e189c45f79a7a878802bde13be27a112 diff --git a/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.py b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.py new file mode 100644 index 0000000000..714049d349 --- /dev/null +++ b/docs/apache-airflow/img/diagram_dag_processor_airflow_architecture.py @@ -0,0 +1,84 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations + +from pathlib import Path + +from diagrams import Cluster, Diagram, Edge +from diagrams.custom import Custom +from diagrams.onprem.client import User +from diagrams.onprem.database import PostgreSQL +from diagrams.programming.flowchart import MultipleDocuments +from rich.console import Console + +MY_DIR = Path(__file__).parent +MY_FILENAME = Path(__file__).with_suffix("").name +PYTHON_MULTIPROCESS_LOGO = MY_DIR.parents[1] / "diagrams" / "python_multiprocess_logo.png" + +console = Console(width=400, color_system="standard") + + +def generate_dag_processor_airflow_diagram(): + dag_processor_architecture_image_file = (MY_DIR / MY_FILENAME).with_suffix(".png") + console.print(f"[bright_blue]Generating architecture image {dag_processor_architecture_image_file}") + with Diagram( + name="", + show=False, + direction="LR", + curvestyle="ortho", + filename=MY_FILENAME, + outformat="png", + ): + operations_user = User("Operations User") + with Cluster("No DAG Python Code Execution", graph_attr={"bgcolor": "lightgrey"}): + with Cluster("Scheduling"): + schedulers = Custom("Scheduler(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + + with Cluster("UI"): + webservers = Custom("Webserver(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + + webservers >> Edge(color="black", style="dashed", reverse=True) >> operations_user + + metadata_db = PostgreSQL("Metadata DB") + + dag_author = User("DAG Author") + with Cluster("DAG Python Code Execution"): + with Cluster("Execution"): + workers = Custom("Worker(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + triggerer = Custom("Triggerer(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + with Cluster("Parsing"): + dag_processors = Custom("DAG\nProcessor(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + dag_files = MultipleDocuments("DAG files") + + dag_author >> Edge(color="black", style="dashed", reverse=False) >> dag_files + + workers - Edge(color="blue", style="dashed", headlabel="Executor") - schedulers + + metadata_db >> Edge(color="red", style="dotted", reverse=True) >> webservers + metadata_db >> Edge(color="red", style="dotted", reverse=True) >> schedulers + dag_processors >> Edge(color="red", style="dotted", reverse=True) >> metadata_db + workers >> Edge(color="red", style="dotted", reverse=True) >> metadata_db + triggerer >> Edge(color="red", style="dotted", reverse=True) >> metadata_db + + dag_files >> Edge(color="brown", style="solid") >> workers + dag_files >> Edge(color="brown", style="solid") >> dag_processors + dag_files >> Edge(color="brown", style="solid") >> triggerer + console.print(f"[green]Generating architecture image {dag_processor_architecture_image_file}") + + +if __name__ == "__main__": + generate_dag_processor_airflow_diagram() diff --git a/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.md5sum b/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.md5sum new file mode 100644 index 0000000000..fb928aa691 --- /dev/null +++ b/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.md5sum @@ -0,0 +1 @@ +aa73a8292341145e0f60682f7047503b diff --git a/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.png b/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.png new file mode 100644 index 0000000000..4057a67615 Binary files /dev/null and b/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.png differ diff --git a/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.py b/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.py new file mode 100644 index 0000000000..393d988bb2 --- /dev/null +++ b/docs/apache-airflow/img/diagram_fab_auth_manager_airflow_architecture.py @@ -0,0 +1,74 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations + +from pathlib import Path + +from diagrams import Cluster, Diagram, Edge +from diagrams.custom import Custom +from diagrams.onprem.client import User +from diagrams.onprem.database import PostgreSQL +from rich.console import Console + +MY_DIR = Path(__file__).parent +MY_FILENAME = Path(__file__).with_suffix("").name +PYTHON_MULTIPROCESS_LOGO = MY_DIR.parents[1] / "diagrams" / "python_multiprocess_logo.png" + +console = Console(width=400, color_system="standard") + + +def generate_fab_auth_manager_airflow_diagram(): + image_file = (MY_DIR / MY_FILENAME).with_suffix(".png") + console.print(f"[bright_blue]Generating architecture image {image_file}") + with Diagram( + name="", + show=False, + direction="LR", + curvestyle="ortho", + filename=MY_FILENAME, + ): + user = User("User") + with Cluster("Airflow environment"): + webserver = Custom("Webserver(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) + + with Cluster("FAB provider"): + fab_auth_manager = Custom("FAB auth manager", PYTHON_MULTIPROCESS_LOGO.as_posix()) + with Cluster("Core Airflow"): + auth_manager_interface = Custom( + "Auth manager\ninterface", PYTHON_MULTIPROCESS_LOGO.as_posix() + ) + + db = PostgreSQL("Metadata DB") + + user >> Edge(color="black", style="solid", reverse=True, label="Access to the console") >> webserver + ( + webserver + >> Edge(color="black", style="solid", reverse=True, label="Is user authorized?") + >> fab_auth_manager + ) + (fab_auth_manager >> Edge(color="black", style="solid", reverse=True) >> db) + ( + fab_auth_manager + >> Edge(color="black", style="dotted", reverse=False, label="Inherit") + >> auth_manager_interface + ) + + console.print(f"[green]Generating architecture image {image_file}") + + +if __name__ == "__main__": + generate_fab_auth_manager_airflow_diagram() diff --git a/images/diagrams/python_multiprocess_logo.png b/docs/diagrams/python_multiprocess_logo.png similarity index 100% rename from images/diagrams/python_multiprocess_logo.png rename to docs/diagrams/python_multiprocess_logo.png diff --git a/scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py b/scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py index 22f05715b1..f809d566e3 100755 --- a/scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py +++ b/scripts/ci/pre_commit/pre_commit_generate_airflow_diagrams.py @@ -18,121 +18,38 @@ from __future__ import annotations import hashlib -import os +import subprocess +import sys from pathlib import Path -from diagrams import Cluster, Diagram, Edge -from diagrams.custom import Custom -from diagrams.onprem.client import User -from diagrams.onprem.database import PostgreSQL -from diagrams.programming.flowchart import MultipleDocuments from rich.console import Console console = Console(width=400, color_system="standard") LOCAL_DIR = Path(__file__).parent AIRFLOW_SOURCES_ROOT = Path(__file__).parents[3] -DOCS_IMAGES_DIR = AIRFLOW_SOURCES_ROOT / "docs" / "apache-airflow" / "img" -PYTHON_MULTIPROCESS_LOGO = AIRFLOW_SOURCES_ROOT / "images" / "diagrams" / "python_multiprocess_logo.png" -BASIC_ARCHITECTURE_IMAGE_NAME = "diagram_basic_airflow_architecture" -DAG_PROCESSOR_AIRFLOW_ARCHITECTURE_IMAGE_NAME = "diagram_dag_processor_airflow_architecture" -DIAGRAM_HASH_FILE_NAME = "diagram_hash.txt" - -def generate_basic_airflow_diagram(filename: str): - basic_architecture_image_file = (DOCS_IMAGES_DIR / BASIC_ARCHITECTURE_IMAGE_NAME).with_suffix(".png") - console.print(f"[bright_blue]Generating architecture image {basic_architecture_image_file}") - with Diagram(name="", show=False, direction="LR", curvestyle="ortho", filename=filename): - with Cluster("Parsing & Scheduling"): - schedulers = Custom("Scheduler(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - - metadata_db = PostgreSQL("Metadata DB") - - dag_author = User("DAG Author") - dag_files = MultipleDocuments("DAG files") - - dag_author >> Edge(color="black", style="dashed", reverse=False) >> dag_files - - with Cluster("Execution"): - workers = Custom("Worker(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - triggerer = Custom("Triggerer(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - - schedulers - Edge(color="blue", style="dashed", taillabel="Executor") - workers - - schedulers >> Edge(color="red", style="dotted", reverse=True) >> metadata_db - workers >> Edge(color="red", style="dotted", reverse=True) >> metadata_db - triggerer >> Edge(color="red", style="dotted", reverse=True) >> metadata_db - - operations_user = User("Operations User") - with Cluster("UI"): - webservers = Custom("Webserver(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - - webservers >> Edge(color="black", style="dashed", reverse=True) >> operations_user - - metadata_db >> Edge(color="red", style="dotted", reverse=True) >> webservers - - dag_files >> Edge(color="brown", style="solid") >> workers - dag_files >> Edge(color="brown", style="solid") >> schedulers - dag_files >> Edge(color="brown", style="solid") >> triggerer - console.print(f"[green]Generating architecture image {basic_architecture_image_file}") - - -def generate_dag_processor_airflow_diagram(filename: str): - dag_processor_architecture_image_file = ( - DOCS_IMAGES_DIR / DAG_PROCESSOR_AIRFLOW_ARCHITECTURE_IMAGE_NAME - ).with_suffix(".png") - console.print(f"[bright_blue]Generating architecture image {dag_processor_architecture_image_file}") - with Diagram(name="", show=False, direction="LR", curvestyle="ortho", filename=filename): - operations_user = User("Operations User") - with Cluster("No DAG Python Code Execution", graph_attr={"bgcolor": "lightgrey"}): - with Cluster("Scheduling"): - schedulers = Custom("Scheduler(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - - with Cluster("UI"): - webservers = Custom("Webserver(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - - webservers >> Edge(color="black", style="dashed", reverse=True) >> operations_user - - metadata_db = PostgreSQL("Metadata DB") - - dag_author = User("DAG Author") - with Cluster("DAG Python Code Execution"): - with Cluster("Execution"): - workers = Custom("Worker(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - triggerer = Custom("Triggerer(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - with Cluster("Parsing"): - dag_processors = Custom("DAG\nProcessor(s)", PYTHON_MULTIPROCESS_LOGO.as_posix()) - dag_files = MultipleDocuments("DAG files") - - dag_author >> Edge(color="black", style="dashed", reverse=False) >> dag_files - - workers - Edge(color="blue", style="dashed", headlabel="Executor") - schedulers - - metadata_db >> Edge(color="red", style="dotted", reverse=True) >> webservers - metadata_db >> Edge(color="red", style="dotted", reverse=True) >> schedulers - dag_processors >> Edge(color="red", style="dotted", reverse=True) >> metadata_db - workers >> Edge(color="red", style="dotted", reverse=True) >> metadata_db - triggerer >> Edge(color="red", style="dotted", reverse=True) >> metadata_db - - dag_files >> Edge(color="brown", style="solid") >> workers - dag_files >> Edge(color="brown", style="solid") >> dag_processors - dag_files >> Edge(color="brown", style="solid") >> triggerer - console.print(f"[green]Generating architecture image {dag_processor_architecture_image_file}") +def _get_file_hash(file_to_check: Path) -> str: + hash_md5 = hashlib.md5() + hash_md5.update(Path(file_to_check).resolve().read_bytes()) + return hash_md5.hexdigest() def main(): - hash_md5 = hashlib.md5() - hash_md5.update(Path(__file__).resolve().read_bytes()) - my_file_hash = hash_md5.hexdigest() - hash_file = LOCAL_DIR / DIAGRAM_HASH_FILE_NAME - if not hash_file.exists() or not hash_file.read_text().strip() == str(my_file_hash).strip(): - os.chdir(DOCS_IMAGES_DIR) - generate_basic_airflow_diagram(BASIC_ARCHITECTURE_IMAGE_NAME) - generate_dag_processor_airflow_diagram(DAG_PROCESSOR_AIRFLOW_ARCHITECTURE_IMAGE_NAME) - hash_file.write_text(str(my_file_hash) + "\n") - else: - console.print("[bright_blue]No changes to generation script. Not regenerating the images.") + # get all files as arguments + for arg in sys.argv[1:]: + source_file = Path(arg).resolve() + checksum = _get_file_hash(source_file) + hash_file = source_file.with_suffix(".md5sum") + if not hash_file.exists() or not hash_file.read_text().strip() == str(checksum).strip(): + console.print(f"[bright_blue]Changes in {source_file}. Regenerating the image.") + subprocess.run( + [sys.executable, source_file.resolve().as_posix()], check=True, cwd=source_file.parent + ) + hash_file.write_text(str(checksum) + "\n") + else: + console.print(f"[bright_blue]No changes in {source_file}. Not regenerating the image.") if __name__ == "__main__":
