aspirepadma-dot opened a new issue, #63926:
URL: https://github.com/apache/airflow/issues/63926

   ### Apache Airflow version
   
   3.1.8
   
   ### If "Other Airflow 3 version" selected, which one?
   
   3.0.6 , 3.1.7 and 3.1.8
   
   ### What happened?
   
   The Airflow Scheduler consistently crashes after several hours of healthy 
operation (approx. 8 hours) We have 74 dags enabled and they get processed 
after few hours say after some 5k dags reached the scheduler goes unhealthy but 
still actively runs.. The crash results in a sqlalchemy.orm.exc.StaleDataError 
which leads to a service failure and subsequent OOM-kill by the OS. This has 
been observed across versions 3.0.6, 3.1.7, and 3.1.8.
   
   1-177-248.ec2.internal airflow[2057]: [2026-03-13T13:11:51.969771Z] 
{scheduler_job_runner.py:744} INFO - Trying to enqueue tasks: [<TaskInstance: 
transient_emr_census_snapshot_reporting_initial.watch_census_snapshot_reporting_initial
 scheduled__2026-03-13T13:00:00+00:00 [scheduled]>] for executor: 
LocalExecutor(parallelism=128)
   Mar 13 13:11:51 ip-10-11-177-248.ec2.internal airflow[2493]: 
[2026-03-13T13:11:51.973285Z] {supervisor.py:1975} INFO - Secrets backends 
loaded for worker count=2 backend_classes=['EnvironmentVariablesBackend', 
'MetastoreBackend']
   Mar 13 13:11:53 ip-10-11-177-248.ec2.internal airflow[2481]: 
[2026-03-13T13:11:53.717939Z] {supervisor.py:1995} INFO - Task finished 
task_instance_id=019ce749-1591-79b1-b844-b1c7427e1b05 exit_code=0 
duration=181.57094679700094 final_state=success
   Mar 13 13:11:55 ip-10-11-177-248.ec2.internal airflow[2519]: 
[2026-03-13T13:11:55.546757Z] {supervisor.py:1995} INFO - Task finished 
task_instance_id=019ce752-1476-7598-9396-0b1c85b55e29 exit_code=0 
duration=31.381990759000473 final_state=up_for_retry
   Mar 13 13:11:56 ip-10-11-177-248.ec2.internal airflow[2526]: 
[2026-03-13T13:11:56.209305Z] {supervisor.py:1995} INFO - Task finished 
task_instance_id=019ce752-147c-755d-ab97-469fb7883f1e exit_code=0 
duration=61.36306266200336 final_state=success
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: 
[2026-03-13T13:11:57.953125Z] {scheduler_job_runner.py:1086} ERROR - Exception 
when executing SchedulerJob._run_scheduler_loop
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: Traceback (most 
recent call last):
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
 line 1082, in _execute
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
self._run_scheduler_loop()
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
 line 1372, in _run_scheduler_loop
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
num_queued_tis = self._do_scheduling(session)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:                 
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
 line 1482, in _do_scheduling
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
callback_tuples = self._schedule_all_dag_runs(guard, dag_runs, session)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:                 
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/utils/retries.py",
 line 97, in wrapped_function
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     for attempt 
in run_with_db_retries(max_retries=retries, logger=logger, **retry_kwargs):
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:                 
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/tenacity/__init__.py",
 line 438, in __iter__
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     do = 
self.iter(retry_state=retry_state)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:          
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/tenacity/__init__.py",
 line 371, in iter
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     result = 
action(retry_state)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:              
^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/tenacity/__init__.py",
 line 393, in <lambda>
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
self._add_action_func(lambda rs: rs.outcome.result())
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:                 
                     ^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     return 
self.__get_result()
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:            
^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     raise 
self._exception
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/utils/retries.py",
 line 106, in wrapped_function
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     return 
func(*args, **kwargs)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:            
^^^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
 line 1924, in _schedule_all_dag_runs
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
callback_tuples = [(run, self._schedule_dag_run(run, session=session)) for run 
in dag_runs]
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:                 
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
 line 2028, in _schedule_dag_run
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     if not 
dag_run.bundle_version and not self._verify_integrity_if_dag_changed(
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:                 
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/jobs/scheduler_job_runner.py",
 line 2092, in _verify_integrity_if_dag_changed
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
dag_run.verify_integrity(dag_version_id=latest_dag_version.id, session=session)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/utils/session.py",
 line 98, in wrapper
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     return 
func(*args, **kwargs)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:            
^^^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/models/dagrun.py",
 line 1694, in verify_integrity
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
self._create_task_instances(self.dag_id, tis_to_create, created_counts, 
hook_is_noop, session=session)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/airflow/models/dagrun.py",
 line 1901, in _create_task_instances
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
session.flush()
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/session.py",
 line 4331, in flush
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
self._flush(objects)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/session.py",
 line 4466, in _flush
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     with 
util.safe_reraise():
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:          
^^^^^^^^^^^^^^^^^^^
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/util/langhelpers.py",
 line 224, in __exit__
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     raise 
exc_value.with_traceback(exc_tb)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/session.py",
 line 4427, in _flush
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
flush_context.execute()
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
 line 466, in execute
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
rec.execute(self)
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/unitofwork.py",
 line 642, in execute
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
util.preloaded.orm_persistence.save_obj(
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/persistence.py",
 line 85, in save_obj
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     
_emit_update_statements(
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:   File 
"/home/airflow/airflow_venv/lib64/python3.12/site-packages/sqlalchemy/orm/persistence.py",
 line 948, in _emit_update_statements
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]:     raise 
orm_exc.StaleDataError(
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: 
sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 'task_instance' 
expected to update 39 row(s); 38 were matched.
   Mar 13 13:11:57 ip-10-11-177-248.ec2.internal airflow[2057]: 
[2026-03-13T13:11:57.967698Z] {local_executor.py:252} INFO - Shutting down 
LocalExecutor; waiting for running tasks to finish.  Signal again if you don't 
want to wait.
   
   ### What you think should happen instead?
   
   Instead the scheduler should go down and automatically restart
   
   ### How to reproduce
   
   Spin new airflow instance and sync the dags enable them leave it run after 
some 6 or 7 hours suddenly the scheduler become unhealthy
   
   ### Operating System
   
   linux
   
   ### Versions of Apache Airflow Providers
   
   This is with 3.0.6 same behaviour in 3.1.7 and 3.1.8
   
   Apache Airflow
   version                | 3.0.6
   executor               | LocalExecutor
   task_logging_handler   | airflow.utils.log.file_task_handler.FileTaskHandler
   sql_alchemy_conn       | 
postgresql+psycopg2://postgres:19ffc405539a19c5a023b933@localhost:5432/airflowdb_13
   dags_folder            | /home/airflow/airflow/dags
   plugins_folder         | /home/airflow/airflow/plugins
   base_log_folder        | /home/airflow/airflow/logs
   remote_base_log_folder | s3://tt-dp-airflow-us-east-1-stg-resources-13/logs/
   
   
   System info
   OS              | Linux
   architecture    | arm
   uname           | uname_result(system='Linux', 
node='ip-10-11-130-200.ec2.internal', 
release='6.1.141-155.222.amzn2023.aarch64', version='#1 SMP Tue Jun 17 10:29:19 
UTC 2025', machine='aarch64')
   locale          | ('C', 'UTF-8')
   python_version  | 3.12.10 (main, Jun  4 2025, 00:00:00) [GCC 11.5.0 20240719 
(Red Hat 11.5.0-5)]
   python_location | /home/airflow/airflow_venv/bin/python3.12
   
   
   Tools info
   git             | git version 2.47.1
   ssh             | OpenSSH_8.7p1, OpenSSL 3.2.2 4 Jun 2024
   kubectl         | NOT AVAILABLE
   gcloud          | NOT AVAILABLE
   cloud_sql_proxy | NOT AVAILABLE
   mysql           | NOT AVAILABLE
   sqlite3         | NOT AVAILABLE
   psql            | psql (PostgreSQL) 17.5
   
   
   Paths info
   airflow_home    | /home/airflow/airflow
   system_path     | 
/home/airflow/.local/bin:/home/airflow/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/snapd/snap/bin
   python_path     | 
/home/airflow/airflow_venv/bin:/usr/lib64/python312.zip:/usr/lib64/python3.12:/usr/lib64/python3.12/lib-dynload:/home/airflow/airflow_venv/lib64/python3.12/site-packages:/home/airflow/airflow_venv/lib/python3.12/site-packages:/
                   | home/airflow/airflow/config:/home/airflow/airflow/plugins
   airflow_on_path | True
   
   
   Providers info
   apache-airflow-providers-amazon          | 9.12.0
   apache-airflow-providers-apache-livy     | 4.4.2
   apache-airflow-providers-apache-spark    | 5.3.2
   apache-airflow-providers-celery          | 3.12.2
   apache-airflow-providers-cncf-kubernetes | 10.7.0
   apache-airflow-providers-common-compat   | 1.7.3
   apache-airflow-providers-common-io       | 1.6.2
   apache-airflow-providers-common-sql      | 1.27.5
   apache-airflow-providers-fab             | 2.4.1
   apache-airflow-providers-http            | 5.3.3
   apache-airflow-providers-postgres        | 6.2.3
   apache-airflow-providers-smtp            | 2.2.0
   apache-airflow-providers-ssh             | 4.1.3
   apache-airflow-providers-standard        | 1.6.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to