[ 
https://issues.apache.org/jira/browse/AIRFLOW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090958#comment-17090958
 ] 

ASF GitHub Bot commented on AIRFLOW-6796:
-----------------------------------------

kaxil commented on a change in pull request #7424:
URL: https://github.com/apache/airflow/pull/7424#discussion_r414148815



##########
File path: airflow/models/serialized_dag.py
##########
@@ -144,22 +144,22 @@ def remove_dag(cls, dag_id: str, session=None):
 
     @classmethod
     @provide_session
-    def remove_deleted_dags(cls, alive_dag_filelocs: List[str], session=None):
-        """Deletes DAGs not included in alive_dag_filelocs.
-
-        :param alive_dag_filelocs: file paths of alive DAGs
-        :param session: ORM Session
+    def remove_stale_dags(cls, expiration_date, session=None):
         """
-        alive_fileloc_hashes = [
-            DagCode.dag_fileloc_hash(fileloc) for fileloc in 
alive_dag_filelocs]
+        Deletes Serialized DAGs that were last touched by the scheduler before
+        the expiration date.  These DAGs were likely deleted.

Review comment:
       ```suggestion
           the expiration date. These DAGs were likely deleted.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Serialized DAGs can be incorrectly deleted
> ------------------------------------------
>
>                 Key: AIRFLOW-6796
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6796
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: serialization
>    Affects Versions: 1.10.9
>            Reporter: Matthew Bruce
>            Priority: Major
>
> With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags` 
> called from `DagFileProcessManager.refresh_dag_dir` can delete the 
> serialization of DAGs if they were loaded via a DagBag and globals in a 
> different `.py` file:
> Consider something like this:
>  {{/home/airflow/dags/loader.py}}
> {code:python}
> dag_bags = []
> dag_bags.append(models.DagBag('/home/airflow/project-a/dags')
> dag_bags.append(models.DagBag('/home/airflow/project-b/dags')
> for dag_bag in dag_bags:
>     for dag in dag_bag:
>       globals()[dag.dag_id] = dag{code}
> with files:
> {code:java}
> /home/airflow/project-a/dags/dag-a.py
> /home/airflow/project-b/dags/dag-b.py
> {code}
>  
> The list of file paths passed to {{SerializedDagModel.remove_deleted_dags}} 
> is only going to contain {{/home/airflow/dags/loader.py}} and the method will 
> remove the serializations for the DAGs in dag-a.py and dag-b.py
> With non-serialized DAGs, airflow seems to mark DAGs as inactive based on 
> when the scheduler last processed them - I wonder if we should make these two 
> methods consistent?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to