luos-fc opened a new issue, #30600:
URL: https://github.com/apache/airflow/issues/30600
### Apache Airflow version
2.5.3
### What happened
When a DAG is removed from a zip in the DAGs directory, but the zip file
remains, it is not marked correctly as inactive. It is still visible in the UI,
and attempting to open the DAG results in an `DAG "mydag" seems to be missing
from DagBag.` error in the UI.
The DAG is removed from the SerializedDag table, resulting in the scheduler
repeatedly erroring with `[2023-04-12T12:43:51.165+0000]
{scheduler_job.py:1063} ERROR - DAG 'mydag' not found in serialized_dag table`.
I have done some minor investigating and it appears that [this piece of
code](https://github.com/apache/airflow/blob/2.5.3/airflow/dag_processing/manager.py#L748-L772)
may be the cause.
`dag_filelocs` provides the path to a specific python file within a zip, so
`SerializedDagModel.remove_deleted_dags` is able to remove the missing DAG.
However, `self._file_paths` only contains the top-level zip name, so
`DagModel.deactivate_deleted_dags` will only deactivate DAGs where the zip they
are contained in is deleted, regardless of whether the DAG is still inside the
zip.
I can see there are [other methods that handle DAG
deactivation](https://github.com/apache/airflow/blob/2.5.3/airflow/models/dag.py#L2945-L2968)
and I'm not sure how these all interact but this does seem to cause this
specific issue.
### What you think should happen instead
DAGS that are no longer in the DagBag are marked as inactive
### How to reproduce
Running airflow locally with docker-compose:
- Create a zipfile with 2 DAG py files in in ./dags
- Wait for the DAGs to be parsed by the scheduler and appear in the UI
- Overwrite the existing DAG zip, with a new zip containing only 1 of the
original DAG py files
- Wait for scheduler loop to parse the new zip
- Attempt to open the removed DAG in the UI, you will see an error
### Operating System
Debian GNU/Linux 11 (bullseye)
### Versions of Apache Airflow Providers
_No response_
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### Anything else
If I replace the docker image in the docker compose with an image built from
this Dockerfile:
```
FROM apache/airflow:2.5.3
RUN sed -i '772s/self._file_paths/dag_filelocs/'
/home/airflow/.local/lib/python3.7/site-packages/airflow/dag_processing/manager.py
RUN sed -i
'3351s/correct_maybe_zipped(dag_model.fileloc)/dag_model.fileloc/'
/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py
```
The DAG is deactivated as expected
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]