[
https://issues.apache.org/jira/browse/AIRFLOW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093775#comment-17093775
]
ASF subversion and git services commented on AIRFLOW-6796:
----------------------------------------------------------
Commit 6450834d97fde3fe67c2022ebaae2797ce1c1b12 in airflow's branch
refs/heads/master from MatthewRBruce
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=6450834 ]
[AIRFLOW-6796] Clean up DAG serializations based on last_updated (#7424)
DAG serializations were previous deleted based on whether the
DagFileProcessorManager had processed a particular python file. This
changes that to be based on the last time a DAG was processed by the
scheduler.
Also moves cleaning up of stale dags to the DagFileProcessorManager to
support long running schedulers
> Serialized DAGs can be incorrectly deleted
> ------------------------------------------
>
> Key: AIRFLOW-6796
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6796
> Project: Apache Airflow
> Issue Type: Bug
> Components: serialization
> Affects Versions: 1.10.9
> Reporter: Matthew Bruce
> Priority: Major
> Fix For: 1.10.11
>
>
> With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags`
> called from `DagFileProcessManager.refresh_dag_dir` can delete the
> serialization of DAGs if they were loaded via a DagBag and globals in a
> different `.py` file:
> Consider something like this:
> {{/home/airflow/dags/loader.py}}
> {code:python}
> dag_bags = []
> dag_bags.append(models.DagBag('/home/airflow/project-a/dags')
> dag_bags.append(models.DagBag('/home/airflow/project-b/dags')
> for dag_bag in dag_bags:
> for dag in dag_bag:
> globals()[dag.dag_id] = dag{code}
> with files:
> {code:java}
> /home/airflow/project-a/dags/dag-a.py
> /home/airflow/project-b/dags/dag-b.py
> {code}
>
> The list of file paths passed to {{SerializedDagModel.remove_deleted_dags}}
> is only going to contain {{/home/airflow/dags/loader.py}} and the method will
> remove the serializations for the DAGs in dag-a.py and dag-b.py
> With non-serialized DAGs, airflow seems to mark DAGs as inactive based on
> when the scheduler last processed them - I wonder if we should make these two
> methods consistent?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)