adammarchewka opened a new issue, #30709:
URL: https://github.com/apache/airflow/issues/30709
### Apache Airflow version
2.5.3
### What happened
Airflow Scheduler started to have issue when being restarted (either
manually or forcefuly) - some task instances are stuck in running/queued state
after restart and Scheduler somehow misses reference to them (or fails to
readopt them) resulting in critical error about TaskInstance missing.
Error requires manual intervention into airflow database (Setting stuck
tasks manually to failed state)
### What you think should happen instead
Scheduler should properly shutdown gracefuly in given time and properly
restart afterward without raising ObjectDeletedError
### How to reproduce
Restart airflow-scheduler/redeploy whole airflow while tasks are running
(are being processed by Scheduler/Workers)
We encounter issue with every restart/redeploy. Not sure if reproducible
outside our system
### Operating System
Debian GNU/Linux 11 (bullseye)
### Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.1.0
apache-airflow-providers-cncf-kubernetes==5.2.2
apache-airflow-providers-common-sql==1.3.4
apache-airflow-providers-docker==3.5.1
apache-airflow-providers-elasticsearch==4.4.0
apache-airflow-providers-ftp==3.3.1
apache-airflow-providers-google==9.0.0
apache-airflow-providers-grpc==3.1.0
apache-airflow-providers-hashicorp==3.3.0
apache-airflow-providers-http==4.2.0
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-mysql==4.0.2
apache-airflow-providers-odbc==3.2.1
apache-airflow-providers-postgres==5.4.0
apache-airflow-providers-redis==3.1.0
apache-airflow-providers-sendgrid==3.1.0
apache-airflow-providers-sftp==4.2.4
apache-airflow-providers-slack==7.2.0
apache-airflow-providers-snowflake==4.0.4
apache-airflow-providers-sqlite==3.3.1
apache-airflow-providers-ssh==3.5.0
### Deployment
Other 3rd-party Helm chart
### Deployment details
Kubernetes versions:
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3",
GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean",
BuildDate:"2022-10-12T10:57:26Z", GoVersion:"go1.19.2", Compiler:"gc",
Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"24",
GitVersion:"v1.24.10-gke.2300",
GitCommit:"1d7ae0799b40b0cd95502e3a5e698db62572e341", GitTreeState:"clean",
BuildDate:"2023-02-22T09:28:49Z", GoVersion:"go1.19.6 X:boringcrypto",
Compiler:"gc", Platform:"linux/amd64"}
Helm version: 3.11.2-1
Deployment via: https://github.com/airflow-helm/charts
### Anything else
Scheduler error log:
Traceback (most recent call last):
File "/home/airflow/.local/bin/airflow", line 8, in <module>
sys.exit(main())
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/__main__.py", line
48, in main
args.func(args)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/cli/cli_parser.py",
line 52, in command
return func(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/utils/cli.py", line
108, in wrapper
return f(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/cli/commands/scheduler_command.py",
line 73, in scheduler
_run_scheduler_job(args=args)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/cli/commands/scheduler_command.py",
line 43, in _run_scheduler_job
job.run()
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/jobs/base_job.py",
line 258, in run
self._execute()
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/jobs/scheduler_job.py",
line 759, in _execute
self._run_scheduler_loop()
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/jobs/scheduler_job.py",
line 840, in _run_scheduler_loop
self.adopt_or_reset_orphaned_tasks()
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/utils/session.py",
line 75, in wrapper
return func(*args, session=session, **kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/jobs/scheduler_job.py",
line 1413, in adopt_or_reset_orphaned_tasks
for attempt in run_with_db_retries(logger=self.log):
File
"/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line
347, in __iter__
do = self.iter(retry_state=retry_state)
File
"/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line
314, in iter
return fut.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in
result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/jobs/scheduler_job.py",
line 1458, in adopt_or_reset_orphaned_tasks
to_reset = self.executor.try_adopt_task_instances(tis_to_reset_or_adopt)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/executors/celery_executor.py",
line 503, in try_adopt_task_instances
if ti.external_executor_id is not None:
File
"/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/attributes.py",
line 482, in __get__
return self.impl.get(state, dict_)
File
"/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/attributes.py",
line 942, in get
value = self._fire_loader_callables(state, key, passive)
File
"/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/attributes.py",
line 976, in _fire_loader_callables
return callable_(state, passive)
File
"/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/strategies.py",
line 561, in __call__
return strategy._load_for_state(state, passive)
File
"/home/airflow/.local/lib/python3.10/site-packages/sqlalchemy/orm/strategies.py",
line 530, in _load_for_state
raise orm_exc.ObjectDeletedError(state)
sqlalchemy.orm.exc.ObjectDeletedError: Instance '<TaskInstance at
0x7f29c778f130>' has been deleted, or its row is otherwise not present.
Custom Helm Values:
airflow:
config:
AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT: "360"
AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL: "60"
AIRFLOW__CELERY__WORKER_CONCURRENCY: 5
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]