[ 
https://issues.apache.org/jira/browse/AIRFLOW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037382#comment-17037382
 ] 

ASF GitHub Bot commented on AIRFLOW-6796:
-----------------------------------------

MatthewRBruce commented on pull request #7424: [AIRFLOW-6796] Clean up DAG 
serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424
 
 
   DAG serializations were previous deleted based on whether the
   DagFileProcessorManager had processed a particular python file.  This caused 
issues if DAGs were import via `globals()` via a different python file.  This
   changes the deletion behaviour to be based on the last time a DAG was 
processed by the
   scheduler instead.  
   
   This also moves cleaning up of stale DAGs from `SchedulerJob` to 
`DagFileProcessorManager`  to support long running schedulers
   
   ---
   Issue link: WILL BE INSERTED BY 
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   - [X] Description above provides context of the change
   - [X] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = 
JIRA ID<sup>*</sup>
   - [X] Unit tests coverage for changes (not needed for documentation changes)
   - [X] Commits follow "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)"
   - [X] Relevant documentation is updated including usage instructions.
   - [X] I will engage committers as explained in [Contribution Workflow 
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with 
`[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in 
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request 
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
 for more information.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Serialized DAGs can be incorrectly deleted
> ------------------------------------------
>
>                 Key: AIRFLOW-6796
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6796
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: serialization
>    Affects Versions: 1.10.9
>            Reporter: Matthew Bruce
>            Priority: Major
>
> With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags` 
> called from `DagFileProcessManager.refresh_dag_dir` can delete the 
> serialization of DAGs if they were loaded via a DagBag and globals in a 
> different `.py` file:
> Consider something like this:
>  {{/home/airflow/dags/loader.py}}
> {code:python}
> dag_bags = []
> dag_bags.append(models.DagBag('/home/airflow/project-a/dags')
> dag_bags.append(models.DagBag('/home/airflow/project-b/dags')
> for dag_bag in dag_bags:
>     for dag in dag_bag:
>       globals()[dag.dag_id] = dag{code}
> with files:
> {code:java}
> /home/airflow/project-a/dags/dag-a.py
> /home/airflow/project-b/dags/dag-b.py
> {code}
>  
> The list of file paths passed to {{SerializedDagModel.remove_deleted_dags}} 
> is only going to contain {{/home/airflow/dags/loader.py}} and the method will 
> remove the serializations for the DAGs in dag-a.py and dag-b.py
> With non-serialized DAGs, airflow seems to mark DAGs as inactive based on 
> when the scheduler last processed them - I wonder if we should make these two 
> methods consistent?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to