kirillsights opened a new issue, #55315: URL: https://github.com/apache/airflow/issues/55315
### Apache Airflow version 3.0.6 ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? After upgrade to airflow 3, system started experiencing random DAG disappearance. Parsing intervals are setup to be pretty long, because we don't update DAGs between deploys. The config for intervals has this setup: ``` dag_processor: dag_file_processor_timeout: 300 min_file_process_interval: 7200 parsing_processes: 1 print_stats_interval: 300 refresh_interval: 1800 stale_dag_threshold: 1800 ``` Log analysis showed that once we receive one callback on DAG processor for any DAG, it soon will be marked as stale and will disappear. It may come back later, once process_interval kicks in. But its not always the case. Full log: [dag_processor.log.zip](https://github.com/user-attachments/files/22181554/dag_processor.log.zip) Points of interest in log: Last time there is no error for particular DAG: ``` 2025-09-04T20:02:57.426Z | {"log":"2025-09-04T20:02:57.426093587Z stdout F dags-folder process_etl_app_data.py 1 0 0.96s 2025-09-04T19:58:39"} ``` Then first callback for it comes in: ``` 2025-09-04T20:05:08.722Z | {"log":"2025-09-04T20:05:08.722840445Z stdout F [2025-09-04T20:05:08.722+0000] {manager.py:464} DEBUG - Queuing TaskCallbackRequest CallbackRequest: filepath='process_etl_app_data.py' bundle_name='dags-folder' bundle_version=None msg=\"{'DAG Id': 'ds_etl', 'Task Id': 'etl_app_data', 'Run Id': 'manual__2025-09-04T20:00:00+00:00', 'Hostname': '10.4.142.168', 'External Executor Id': '5547a318-f6cc-4c02-92f5-90cbbb629e22'}\" ti=TaskInstance(id=UUID('01991650-8c36-70c5-a85b-44f6b572fe0f'), task_id='etl_app_data', dag_id='ds_etl', run_id='manual__2025-09-04T20:00:00+00:00', try_number=1, map_index=-1, hostname='10.4.142.168', context_carrier=None) task_callback_type=None context_from_server=TIRunContext(dag_run=DagRun(dag_id='ds_etl', run_id='manual__2025-09-04T20:00:00+00:00', logical_date=datetime.datetime(2025, 9, 4, 20, 0, tzinfo=Timezone('UTC')), data_interval_start=datetime.datetime(2025, 9, 4, 20, 0, 1, 133909, tzinfo=Timezone('UTC')), data_interval_end =datetime.datetime(2025, 9, 4, 20, 0, 1, 133909, tzinfo=Timezone('UTC')), run_after=datetime.datetime(2025, 9, 4, 20, 0, 1, 133909, tzinfo=Timezone('UTC')), start_date=datetime.datetime(2025, 9, 4, 20, 0, 1, 176556, tzinfo=Timezone('UTC')), end_date=None, clear_number=0, run_type=<DagRunType.MANUAL: 'manual'>, state=<DagRunState.RUNNING: 'running'>, conf={}, consumed_asset_events=[]), task_reschedule_count=0, max_tries=7, variables=[], connections=[], upstream_map_indexes=None, next_method=None, next_kwargs=None, xcom_keys_to_clear=[], should_retry=False) type='TaskCallbackRequest'"} ``` Then during next print of stats we have an error in this file (though it has not changed at all): ``` 2025-09-04T20:12:58.040Z | {"log":"2025-09-04T20:12:58.040610948Z stdout F dags-folder process_etl_app_data.py 0 1 1.01s 2025-09-04T20:12:50"} ``` Eventually the DAG from that file disappears: ``` 2025-09-04T20:57:53.765Z | {"log":"2025-09-04T20:57:53.765305682Z stdout F [2025-09-04T20:57:53.764+0000] {manager.py:310} INFO - DAG ds_etl is missing and will be deactivated."} ``` Further analysis showed that DAG processor seems to be reusing same parsing mechanism for callback execution and updates file parsing time, though does not update DAG parsing time. The DAG eventually becomes stale. ### What you think should happen instead? Processing callbacks should not affect DAG state. And I think we should still be able to set reparsing timers for rare parsing. ### How to reproduce - Have DAG with callbacks - Set `min_file_process_interval` higher than `stale_dag_threshold` and deploy airflow - Execute DAG, so callbacks are executed ### Operating System Debian Bookworm ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==9.12.0 apache-airflow-providers-celery==3.12.2 apache-airflow-providers-common-compat==1.7.3 apache-airflow-providers-common-io==1.6.2 apache-airflow-providers-common-messaging==1.0.5 apache-airflow-providers-common-sql==1.27.5 apache-airflow-providers-fab==2.4.1 apache-airflow-providers-http==5.3.3 apache-airflow-providers-postgres==6.2.3 apache-airflow-providers-redis==4.2.0 apache-airflow-providers-slack==9.1.4 apache-airflow-providers-smtp==2.2.0 apache-airflow-providers-standard==1.6.0 ### Deployment Official Apache Airflow Helm Chart ### Deployment details Helm chart deployed on AWS EKS cluster ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org