luos-fc opened a new issue, #30600:
URL: https://github.com/apache/airflow/issues/30600

   ### Apache Airflow version
   
   2.5.3
   
   ### What happened
   
   When a DAG is removed from a zip in the DAGs directory, but the zip file 
remains, it is not marked correctly as inactive. It is still visible in the UI, 
and attempting to open the DAG results in an `DAG "mydag" seems to be missing 
from DagBag.` error in the UI.
   
   The DAG is removed from the SerializedDag table, resulting in the scheduler 
repeatedly erroring with `[2023-04-12T12:43:51.165+0000] 
{scheduler_job.py:1063} ERROR - DAG 'mydag' not found in serialized_dag table`.
   
   I have done some minor investigating and it appears that [this piece of 
code](https://github.com/apache/airflow/blob/2.5.3/airflow/dag_processing/manager.py#L748-L772)
 may be the cause.
   
   `dag_filelocs` provides the path to a specific python file within a zip, so 
`SerializedDagModel.remove_deleted_dags` is able to remove the missing DAG.
   
   However, `self._file_paths` only contains the top-level zip name, so 
`DagModel.deactivate_deleted_dags` will only deactivate DAGs where the zip they 
are contained in is deleted, regardless of whether the DAG is still inside the 
zip.
   
   I can see there are [other methods that handle DAG 
deactivation](https://github.com/apache/airflow/blob/2.5.3/airflow/models/dag.py#L2945-L2968)
 and I'm not sure how these all interact but this does seem to cause this 
specific issue.
   
   ### What you think should happen instead
   
   DAGS that are no longer in the DagBag are marked as inactive
   
   ### How to reproduce
   
   Running airflow locally with docker-compose:
   - Create a zipfile with 2 DAG py files in in ./dags
   - Wait for the DAGs to be parsed by the scheduler and appear in the UI
   - Overwrite the existing DAG zip, with a new zip containing only 1 of the 
original DAG py files
   - Wait for scheduler loop to parse the new zip
   - Attempt to open the removed DAG in the UI, you will see an error
   
   
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   If I replace the docker image in the docker compose with an image built from 
this Dockerfile:
   
   ```
   FROM apache/airflow:2.5.3
   RUN sed -i '772s/self._file_paths/dag_filelocs/' 
/home/airflow/.local/lib/python3.7/site-packages/airflow/dag_processing/manager.py
   RUN sed -i 
'3351s/correct_maybe_zipped(dag_model.fileloc)/dag_model.fileloc/' 
/home/airflow/.local/lib/python3.7/site-packages/airflow/models/dag.py
   ```
   
   The DAG is deactivated as expected
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to