kirillsights opened a new issue, #55315:
URL: https://github.com/apache/airflow/issues/55315
### Apache Airflow version
3.0.6
### If "Other Airflow 2 version" selected, which one?
_No response_
### What happened?
After upgrade to airflow 3, system started experiencing random DAG
disappearance.
Parsing intervals are setup to be pretty long, because we don't update DAGs
between deploys.
The config for intervals has this setup:
```
dag_processor:
dag_file_processor_timeout: 300
min_file_process_interval: 7200
parsing_processes: 1
print_stats_interval: 300
refresh_interval: 1800
stale_dag_threshold: 1800
```
Log analysis showed that once we receive one callback on DAG processor for
any DAG, it soon will be marked as stale and will disappear.
It may come back later, once process_interval kicks in. But its not always
the case.
Full log:
[dag_processor.log.zip](https://github.com/user-attachments/files/22181554/dag_processor.log.zip)
Points of interest in log:
Last time there is no error for particular DAG:
```
2025-09-04T20:02:57.426Z | {"log":"2025-09-04T20:02:57.426093587Z stdout F
dags-folder process_etl_app_data.py 1 0 0.96s 2025-09-04T19:58:39"}
```
Then first callback for it comes in:
```
2025-09-04T20:05:08.722Z | {"log":"2025-09-04T20:05:08.722840445Z stdout F
[2025-09-04T20:05:08.722+0000] {manager.py:464} DEBUG - Queuing
TaskCallbackRequest CallbackRequest: filepath='process_etl_app_data.py'
bundle_name='dags-folder' bundle_version=None msg=\"{'DAG Id': 'ds_etl', 'Task
Id': 'etl_app_data', 'Run Id': 'manual__2025-09-04T20:00:00+00:00', 'Hostname':
'10.4.142.168', 'External Executor Id':
'5547a318-f6cc-4c02-92f5-90cbbb629e22'}\"
ti=TaskInstance(id=UUID('01991650-8c36-70c5-a85b-44f6b572fe0f'),
task_id='etl_app_data', dag_id='ds_etl',
run_id='manual__2025-09-04T20:00:00+00:00', try_number=1, map_index=-1,
hostname='10.4.142.168', context_carrier=None) task_callback_type=None
context_from_server=TIRunContext(dag_run=DagRun(dag_id='ds_etl',
run_id='manual__2025-09-04T20:00:00+00:00',
logical_date=datetime.datetime(2025, 9, 4, 20, 0, tzinfo=Timezone('UTC')),
data_interval_start=datetime.datetime(2025, 9, 4, 20, 0, 1, 133909,
tzinfo=Timezone('UTC')), data_interval_end
=datetime.datetime(2025, 9, 4, 20, 0, 1, 133909, tzinfo=Timezone('UTC')),
run_after=datetime.datetime(2025, 9, 4, 20, 0, 1, 133909,
tzinfo=Timezone('UTC')), start_date=datetime.datetime(2025, 9, 4, 20, 0, 1,
176556, tzinfo=Timezone('UTC')), end_date=None, clear_number=0,
run_type=<DagRunType.MANUAL: 'manual'>, state=<DagRunState.RUNNING: 'running'>,
conf={}, consumed_asset_events=[]), task_reschedule_count=0, max_tries=7,
variables=[], connections=[], upstream_map_indexes=None, next_method=None,
next_kwargs=None, xcom_keys_to_clear=[], should_retry=False)
type='TaskCallbackRequest'"}
```
Then during next print of stats we have an error in this file (though it has
not changed at all):
```
2025-09-04T20:12:58.040Z | {"log":"2025-09-04T20:12:58.040610948Z stdout F
dags-folder process_etl_app_data.py 0 1 1.01s 2025-09-04T20:12:50"}
```
Eventually the DAG from that file disappears:
```
2025-09-04T20:57:53.765Z | {"log":"2025-09-04T20:57:53.765305682Z stdout F
[2025-09-04T20:57:53.764+0000] {manager.py:310} INFO - DAG ds_etl is missing
and will be deactivated."}
```
Further analysis showed that DAG processor seems to be reusing same parsing
mechanism for callback execution and updates file parsing time, though does not
update DAG parsing time. The DAG eventually becomes stale.
### What you think should happen instead?
Processing callbacks should not affect DAG state.
And I think we should still be able to set reparsing timers for rare parsing.
### How to reproduce
- Have DAG with callbacks
- Set `min_file_process_interval` higher than `stale_dag_threshold` and
deploy airflow
- Execute DAG, so callbacks are executed
### Operating System
Debian Bookworm
### Versions of Apache Airflow Providers
apache-airflow-providers-amazon==9.12.0
apache-airflow-providers-celery==3.12.2
apache-airflow-providers-common-compat==1.7.3
apache-airflow-providers-common-io==1.6.2
apache-airflow-providers-common-messaging==1.0.5
apache-airflow-providers-common-sql==1.27.5
apache-airflow-providers-fab==2.4.1
apache-airflow-providers-http==5.3.3
apache-airflow-providers-postgres==6.2.3
apache-airflow-providers-redis==4.2.0
apache-airflow-providers-slack==9.1.4
apache-airflow-providers-smtp==2.2.0
apache-airflow-providers-standard==1.6.0
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Helm chart deployed on AWS EKS cluster
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]