[
https://issues.apache.org/jira/browse/AIRFLOW-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037382#comment-17037382
]
ASF GitHub Bot commented on AIRFLOW-6796:
-----------------------------------------
MatthewRBruce commented on pull request #7424: [AIRFLOW-6796] Clean up DAG
serializations based on last_updated
URL: https://github.com/apache/airflow/pull/7424
DAG serializations were previous deleted based on whether the
DagFileProcessorManager had processed a particular python file. This caused
issues if DAGs were import via `globals()` via a different python file. This
changes the deletion behaviour to be based on the last time a DAG was
processed by the
scheduler instead.
This also moves cleaning up of stale DAGs from `SchedulerJob` to
`DagFileProcessorManager` to support long running schedulers
---
Issue link: WILL BE INSERTED BY
[boring-cyborg](https://github.com/kaxil/boring-cyborg)
Make sure to mark the boxes below before creating PR: [x]
- [X] Description above provides context of the change
- [X] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN =
JIRA ID<sup>*</sup>
- [X] Unit tests coverage for changes (not needed for documentation changes)
- [X] Commits follow "[How to write a good git commit
message](http://chris.beams.io/posts/git-commit/)"
- [X] Relevant documentation is updated including usage instructions.
- [X] I will engage committers as explained in [Contribution Workflow
Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
<sup>*</sup> For document-only changes commit message can start with
`[AIRFLOW-XXXX]`.
---
In case of fundamental code change, Airflow Improvement Proposal
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals))
is needed.
In case of a new dependency, check compliance with the [ASF 3rd Party
License Policy](https://www.apache.org/legal/resolved.html#category-x).
In case of backwards incompatible changes please leave a note in
[UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
Read the [Pull Request
Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)
for more information.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Serialized DAGs can be incorrectly deleted
> ------------------------------------------
>
> Key: AIRFLOW-6796
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6796
> Project: Apache Airflow
> Issue Type: Bug
> Components: serialization
> Affects Versions: 1.10.9
> Reporter: Matthew Bruce
> Priority: Major
>
> With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags`
> called from `DagFileProcessManager.refresh_dag_dir` can delete the
> serialization of DAGs if they were loaded via a DagBag and globals in a
> different `.py` file:
> Consider something like this:
> {{/home/airflow/dags/loader.py}}
> {code:python}
> dag_bags = []
> dag_bags.append(models.DagBag('/home/airflow/project-a/dags')
> dag_bags.append(models.DagBag('/home/airflow/project-b/dags')
> for dag_bag in dag_bags:
> for dag in dag_bag:
> globals()[dag.dag_id] = dag{code}
> with files:
> {code:java}
> /home/airflow/project-a/dags/dag-a.py
> /home/airflow/project-b/dags/dag-b.py
> {code}
>
> The list of file paths passed to {{SerializedDagModel.remove_deleted_dags}}
> is only going to contain {{/home/airflow/dags/loader.py}} and the method will
> remove the serializations for the DAGs in dag-a.py and dag-b.py
> With non-serialized DAGs, airflow seems to mark DAGs as inactive based on
> when the scheduler last processed them - I wonder if we should make these two
> methods consistent?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)