SamWheating opened a new pull request #17121:
URL: https://github.com/apache/airflow/pull/17121


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   Closes: https://github.com/apache/airflow/issues/11901
   
   Ensuring that the active DAGs in the DB are all actually present in their 
corresponding python files by reconciling the DB state and with the contents of 
a given DAG file on every parse operation. 
   
   This _should_ prevent a lot of issues encountered when writing multiple DAGs 
per-file, renaming DAGs or dynamically generating DAGs based on a config file 
read at parse-time.
   
   #### Validation:
   
   I have validated these changes in a local breeze environment with the 
following DAG:
   
   ```python
   from airflow.models import DAG
   from airflow import utils
   from airflow.operators.python import PythonOperator
   
   NUM_DAGS=1
   
   def message():
       print('Hello, world.')
   
   for i in range(NUM_DAGS):
       with DAG(f'dag-{i}', schedule_interval=None, 
start_date=utils.dates.days_ago(1)) as dag:
           task = PythonOperator(
               task_id='task',
               python_callable=message
           )
           globals()[f"dag_{i}"] = dag
   
   ```
   
   By changing the value of `NUM_DAGS` I can quickly change the number of DAG 
objects present in this file. 
   
   Before this change, decreasing the value of `NUM_DAGS` would leave a bunch 
of stale DAGs in the UI. These could be triggered but would then fail as the 
executor was not able to load the specified task from the file.
   
   After implementing this change, stale DAGs disappear from the UI shortly 
after decreasing the value of `NUM_DAGS`.
   
   (I will add some tests as well once I'm confident that this is the correct 
approach to fix the issue).
   
   #### Questions:
   
   1. Is there a good reason for Airflow to mark inactive DAGs as active if the 
file still exists? I looked through the [original PR which introduced 
this](https://github.com/apache/airflow/pull/5743/files) but couldn't find an 
explanation.
   
   2. How significant is the performance hit incurred by updating the DAG table 
on every parse operation?
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to