SamWheating opened a new pull request #17121: URL: https://github.com/apache/airflow/pull/17121
<!-- Thank you for contributing! Please make sure that your code changes are covered with tests. And in case of new features or big changes remember to adjust the documentation. Feel free to ping committers for the review! In case of existing issue, reference it using one of the following: closes: #ISSUE related: #ISSUE How to write a good git commit message: http://chris.beams.io/posts/git-commit/ --> Closes: https://github.com/apache/airflow/issues/11901 Ensuring that the active DAGs in the DB are all actually present in their corresponding python files by reconciling the DB state and with the contents of a given DAG file on every parse operation. This _should_ prevent a lot of issues encountered when writing multiple DAGs per-file, renaming DAGs or dynamically generating DAGs based on a config file read at parse-time. #### Validation: I have validated these changes in a local breeze environment with the following DAG: ```python from airflow.models import DAG from airflow import utils from airflow.operators.python import PythonOperator NUM_DAGS=1 def message(): print('Hello, world.') for i in range(NUM_DAGS): with DAG(f'dag-{i}', schedule_interval=None, start_date=utils.dates.days_ago(1)) as dag: task = PythonOperator( task_id='task', python_callable=message ) globals()[f"dag_{i}"] = dag ``` By changing the value of `NUM_DAGS` I can quickly change the number of DAG objects present in this file. Before this change, decreasing the value of `NUM_DAGS` would leave a bunch of stale DAGs in the UI. These could be triggered but would then fail as the executor was not able to load the specified task from the file. After implementing this change, stale DAGs disappear from the UI shortly after decreasing the value of `NUM_DAGS`. (I will add some tests as well once I'm confident that this is the correct approach to fix the issue). #### Questions: 1. Is there a good reason for Airflow to mark inactive DAGs as active if the file still exists? I looked through the [original PR which introduced this](https://github.com/apache/airflow/pull/5743/files) but couldn't find an explanation. 2. How significant is the performance hit incurred by updating the DAG table on every parse operation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
