aspirepadma-dot opened a new issue, #63926:
URL: https://github.com/apache/airflow/issues/63926
### Apache Airflow version
3.1.8
### If "Other Airflow 3 version" selected, which one?
3.0.6 , 3.1.7 and 3.1.8
### What happened?
The Airflow Scheduler consistently crashes after several hours of healthy
operation (approx. 8 hours) We have 74 dags enabled and they get processed
after few hours say after some 5k dags reached the scheduler goes unhealthy but
still actively runs.. The crash results in a sqlalchemy.orm.exc.StaleDataError
which leads to a service failure and subsequent OOM-kill by the OS. This has
been observed across versions 3.0.6, 3.1.7, and 3.1.8.
1-177-248.ec2.internal airflow[2057]: [2026-03-13T13:11:51.969771Z]
{scheduler_job_runner.py:744} INFO - Trying to enqueue tasks: [<TaskInstance:
transient_emr_census_snapshot_reporting_initial.watch_census_snapshot_reporting_initial
scheduled__2026-03-13T13:00:00+00:00 [scheduled]>] for executor:
LocalExecutor(parallelism=128)
Mar 13 13:11:51 ip-10-11-177-248.ec2.internal airflow[2493]:
[2026-03-13T13:11:51.973285Z] {supervisor.py:1975} INFO - Secrets backends
loaded for worker count=2 backend_classes=['EnvironmentVariablesBackend',
'MetastoreBackend']
Mar 13 13:11:53 ip-10-11-177-248.ec2.internal airflow[2481]:
[2026-03-13T13:11:53.717939Z] {supervisor.py:1995} INFO - Task finished
task_instance_id=019ce749-1591-79b1-b844-b1c7427e1b05 exit_code=0
duration=181.57094679700094 final_state=success
Mar 13 13:11:55 ip-10-11-177-248.ec2.internal airflow[2519]:
[2026-03-13T13:11:55.546757Z] {supervisor.py:1995} INFO - Task finished
task_instance_id=019ce752-1476-7598-9396-0b1c85b55e29 exit_code=0
duration=31.381990759000473 final_state=up_for_retry
Mar 13 13:11:56 ip-10-11-177-248.ec2.internal airflow[2526]:
[2026-03-13T13:11:56.209305Z] {supervisor.py:1995} INFO - Task finished
task_instance_id=019ce752-147c-755d-ab97-469fb7883f1e exit_code=0
duration=61.36306266200336 final_state=success
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
[2026-03-13T13:11:57.953125Z] {scheduler_job_runner.py:1086} ERROR - Exception
when executing SchedulerJob._run_scheduler_loop
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: Traceback (most
recent call last):
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 1082, in _execute
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
self._run_scheduler_loop()
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 1372, in _run_scheduler_loop
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
num_queued_tis = self._do_scheduling(session)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 1482, in _do_scheduling
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
callback_tuples = self._schedule_all_dag_runs(guard, dag_runs, session)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/utils/retries.py",
line 97, in wrapped_function
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: for attempt
in run_with_db_retries(max_retries=retries, logger=logger, **retry_kwargs):
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/tenacity/__init__.py",
line 438, in __iter__
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: do =
self.iter(retry_state=retry_state)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/tenacity/__init__.py",
line 371, in iter
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: result =
action(retry_state)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/tenacity/__init__.py",
line 393, in <lambda>
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
self._add_action_func(lambda rs: rs.outcome.result())
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: return
self.__get_result()
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: raise
self._exception
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/utils/retries.py",
line 106, in wrapped_function
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: return
func(*args, **kwargs)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 1924, in _schedule_all_dag_runs
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
callback_tuples = [(run, self._schedule_dag_run(run, session=session)) for run
in dag_runs]
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 2028, in _schedule_dag_run
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: if not
dag_run.bundle_version and not self._verify_integrity_if_dag_changed(
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
line 2092, in _verify_integrity_if_dag_changed
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
dag_run.verify_integrity(dag_version_id=latest_dag_version.id, session=session)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/utils/session.py",
line 98, in wrapper
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: return
func(*args, **kwargs)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/models/dagrun.py",
line 1694, in verify_integrity
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
self._create_task_instances(self.dag_id, tis_to_create, created_counts,
hook_is_noop, session=session)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/models/dagrun.py",
line 1901, in _create_task_instances
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
session.flush()
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/session.py",
line 4331, in flush
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
self._flush(objects)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/session.py",
line 4466, in _flush
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: with
util.safe_reraise():
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
^^^^^^^^^^^^^^^^^^^
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/util/langhelpers.py",
line 224, in __exit__
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: raise
exc_value.with_traceback(exc_tb)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/session.py",
line 4427, in _flush
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
flush_context.execute()
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
line 466, in execute
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
rec.execute(self)
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
line 642, in execute
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
util.preloaded.orm_persistence.save_obj(
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/persistence.py",
line 85, in save_obj
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
_emit_update_statements(
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: File
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/persistence.py",
line 948, in _emit_update_statements
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: raise
orm_exc.StaleDataError(
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 'task_instance'
expected to update 39 row(s); 38 were matched.
Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:
[2026-03-13T13:11:57.967698Z] {local_executor.py:252} INFO - Shutting down
LocalExecutor; waiting for running tasks to finish. Signal again if you don't
want to wait.
### What you think should happen instead?
Instead the scheduler should go down and automatically restart
### How to reproduce
Spin new airflow instance and sync the dags enable them leave it run after
some 6 or 7 hours suddenly the scheduler become unhealthy
### Operating System
linux
### Versions of Apache Airflow Providers
This is with 3.0.6 same behaviour in 3.1.7 and 3.1.8
Apache Airflow
version | 3.0.6
executor | LocalExecutor
task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler
sql_alchemy_conn |
postgresql+psycopg2://postgres:19ffc405539a19c5a023b933@localhost:5432/airflowdb_13
dags_folder | /home/airflow/airflow/dags
plugins_folder | /home/airflow/airflow/plugins
base_log_folder | /home/airflow/airflow/logs
remote_base_log_folder | s3://tt-dp-airflow-us-east-1-stg-resources-13/logs/
System info
OS | Linux
architecture | arm
uname | uname_result(system='Linux',
node='ip-10-11-130-200.ec2.internal',
release='6.1.141-155.222.amzn2023.aarch64', version='#1 SMP Tue Jun 17 10:29:19
UTC 2025', machine='aarch64')
locale | ('C', 'UTF-8')
python_version | 3.12.10 (main, Jun 4 2025, 00:00:00) [GCC 11.5.0 20240719
(Red Hat 11.5.0-5)]
python_location | /home/airflow/airflow_venv/bin/python3.12
Tools info
git | git version 2.47.1
ssh | OpenSSH_8.7p1, OpenSSL 3.2.2 4 Jun 2024
kubectl | NOT AVAILABLE
gcloud | NOT AVAILABLE
cloud_sql_proxy | NOT AVAILABLE
mysql | NOT AVAILABLE
sqlite3 | NOT AVAILABLE
psql | psql (PostgreSQL) 17.5
Paths info
airflow_home | /home/airflow/airflow
system_path |
/home/airflow/.local/bin:/home/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/snapd/snap/bin
python_path |
/home/airflow/airflow_venv/bin:/usr/lib64/python312.zip:/usr/lib64/python3.12:/usr/lib64/python3.12/lib-dynload:/home/airflow/airflow_venv/lib64/python3.12/site-packages:/home/airflow/airflow_venv/lib/python3.12/site-packages:/
| home/airflow/airflow/config:/home/airflow/airflow/plugins
airflow_on_path | True
Providers info
apache-airflow-providers-amazon | 9.12.0
apache-airflow-providers-apache-livy | 4.4.2
apache-airflow-providers-apache-spark | 5.3.2
apache-airflow-providers-celery | 3.12.2
apache-airflow-providers-cncf-kubernetes | 10.7.0
apache-airflow-providers-common-compat | 1.7.3
apache-airflow-providers-common-io | 1.6.2
apache-airflow-providers-common-sql | 1.27.5
apache-airflow-providers-fab | 2.4.1
apache-airflow-providers-http | 5.3.3
apache-airflow-providers-postgres | 6.2.3
apache-airflow-providers-smtp | 2.2.0
apache-airflow-providers-ssh | 4.1.3
apache-airflow-providers-standard | 1.6.0
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]