[ 
https://issues.apache.org/jira/browse/AIRFLOW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093775#comment-17093775
 ] 

ASF subversion and git services commented on AIRFLOW-6796:
----------------------------------------------------------

Commit 6450834d97fde3fe67c2022ebaae2797ce1c1b12 in airflow's branch 
refs/heads/master from MatthewRBruce
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=6450834 ]

[AIRFLOW-6796] Clean up DAG serializations based on last_updated (#7424)

DAG serializations were previous deleted based on whether the
DagFileProcessorManager had processed a particular python file.  This
changes that to be based on the last time a DAG was processed by the
scheduler.

Also moves cleaning up of stale dags to the DagFileProcessorManager to
support long running schedulers


> Serialized DAGs can be incorrectly deleted
> ------------------------------------------
>
>                 Key: AIRFLOW-6796
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6796
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: serialization
>    Affects Versions: 1.10.9
>            Reporter: Matthew Bruce
>            Priority: Major
>             Fix For: 1.10.11
>
>
> With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags` 
> called from `DagFileProcessManager.refresh_dag_dir` can delete the 
> serialization of DAGs if they were loaded via a DagBag and globals in a 
> different `.py` file:
> Consider something like this:
>  {{/home/airflow/dags/loader.py}}
> {code:python}
> dag_bags = []
> dag_bags.append(models.DagBag('/home/airflow/project-a/dags')
> dag_bags.append(models.DagBag('/home/airflow/project-b/dags')
> for dag_bag in dag_bags:
>     for dag in dag_bag:
>       globals()[dag.dag_id] = dag{code}
> with files:
> {code:java}
> /home/airflow/project-a/dags/dag-a.py
> /home/airflow/project-b/dags/dag-b.py
> {code}
>  
> The list of file paths passed to {{SerializedDagModel.remove_deleted_dags}} 
> is only going to contain {{/home/airflow/dags/loader.py}} and the method will 
> remove the serializations for the DAGs in dag-a.py and dag-b.py
> With non-serialized DAGs, airflow seems to mark DAGs as inactive based on 
> when the scheduler last processed them - I wonder if we should make these two 
> methods consistent?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to