Matthew Bruce created AIRFLOW-6796:
--------------------------------------
Summary: Serialized DAGs can be incorrectly deleted
Key: AIRFLOW-6796
URL: https://issues.apache.org/jira/browse/AIRFLOW-6796
Project: Apache Airflow
Issue Type: Bug
Components: serialization
Affects Versions: 1.10.9
Reporter: Matthew Bruce
With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags`
called from `DagFileProcessManager.refresh_dag_dir` can delete the
serialization of DAGs if they were loaded via a DagBag and globals in a
different `.py` file:
Consider something like this:
`/home/airflow/dags/loader.py`
```
dags = []
dags.append(models.DagBag('/home/airflow/project-a/dags')
dags.append(models.DagBag('/home/airflow/project-b/dags')
globals().update(dags)
```
with files:
`/home/airflow/project-a/dags/dag-a.py`
`/home/airflow/project-b/dags/dag-b.py`
The list of file paths passed to `SerializedDagModel.remove_deleted_dags` is
only going to contain `/home/airflow/dags/loader.py` and the method will remove
the serializations for the DAGs in dag-a.py and dag-b.py
With non-serialized DAGs, airflow seems to mark DAGs as inactive based on when
the scheduler last processed them - I wonder if we should make these two
methods consistent?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)