akomisarek opened a new issue, #50890: URL: https://github.com/apache/airflow/issues/50890
### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? 2.10.x ### What happened? We were aware of this change https://github.com/apache/airflow/pull/38891 while upgrading to 2.10, which solves some problems/expectations reported here: https://github.com/apache/airflow/issues/38826 Unfortunately, it broke for in quite weird circumstances. We are working of improvements on our end, but we believe it's actually unexpected behaviour of the feature. We have got many DAGs in single instance (close to 2k) and we have deployment process to K8s, with image without DAGs, they are only subsequently synced. We also have `dag_processor` enabled. What is happening is one of two things: * The `dag_processor` decides to remove DAGs which have not been parsed for some time, due to many DAGs/slow parsing * The DAGs are deactivated during the Airflow startup. At this moment any `dataset` producing task which finishes while the downstream Dataset scheduled DAG is deactivated will produce event which will be ignored. It will be only picked up during subsequent execution, but it can lead to delays. I was reading the conversation that the `catchup` flag is not used and I feel like this is the problem for us - it intuitively doesn't make sense it is ignored, i.e. I wouldn't expect DAG to pick up more Dataset on subsequent trigger, rather as soon as possible. ### What you think should happen instead? I believe either one of two things should happen: * If `catchup` is configured, the DAG should be scheduled immediately when it is activated/appears.This would be consistent behaviour of Time scheduling and catchup parameter. * OR maybe the change introduced in https://github.com/apache/airflow/pull/38891 could be relaxed and still trigger deactivated DAGs (only paused could be ignored?). Any other ideas? We are obviously working on hour end to avoid long parsing/deactivations, but I believe this behaviour is quite confusing. It was quite challenging to spot/troubleshoot and led to daily data delays (in some instances longer if you were extremely unlucky) Is Airflow 3 handling this any better? ### How to reproduce I believe our scenario can be reproducing by having Dataset aware DAG, and appropriate consumer. Removing the consumer while the upstream job is triggered, and then reading it upon completion. The `dataset` event won't cause downstream DAG to execute, but next upstream execution will trigger downstream and pass two events as one execution. ### Operating System K8s build from base images ### Versions of Apache Airflow Providers N/A - can be reproduced on raw Airflow. ### Deployment Official Apache Airflow Helm Chart ### Deployment details Helmfile to K8s ### Anything else? Root cause is: ``` 2025-05-03 03:09:03.673 | [2025-05-03T02:09:03.673+0000] {manager.py:537} INFO - DAG {DAG_ID} is missing and will be deactivated. | -- | -- | -- | | 2025-05-03 03:09:03.678 | [2025-05-03T02:09:03.678+0000] {manager.py:549} INFO - Deactivated 1 DAGs which are no longer present in file. | | | 2025-05-03 03:09:03.688 | [2025-05-03T02:09:03.688+0000] {manager.py:553} INFO - Deleted DAG {DAG_ID} in serialized_dag table ``` it happens for us at scale every couple of days with current setup ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
