hterik opened a new issue, #25021: URL: https://github.com/apache/airflow/issues/25021
### Apache Airflow version 2.3.2 ### What happened Scheduler was restarted. After this it starts resetting some running tasks as orphaned. I have seen https://github.com/apache/airflow/issues/20982 which lists this as known issue _for manually started tasks_, but we also see it occasionally for scheduled tasks. This issue appears to be a regression after 2.3 upgrade, i don't recall ever seeing it in 2.2, now we experience it almost every time scheduler restarts. Which happen almost once per day due to crashes caused by connection flakiness to the Kubernetes API or PGSQL. ### What you think should happen instead Tasks should be adopted ### How to reproduce Start some tasks running on Kubernetes with KubernetesCeleryExecutor. Restart scheduler Scheduler logs show following: ``` 2022-07-13 09:59:19 {scheduler_job.py:353} INFO tasks up for execution: <TaskInstance: XXXXX scheduled__2022-07-12T16:00:00+00:00 [scheduled]> 2022-07-13 09:59:19 {scheduler_job.py:504} INFO Setting the following tasks to queued state: <TaskInstance: XXXXX scheduled__2022-07-12T16:00:00+00:00 [scheduled]> 2022-07-13 09:59:20 {scheduler_job.py:633} INFO Setting external_id for <TaskInstance: XXXXX scheduled__2022-07-12T16:00:00+00:00 [queued]> to 38321 ... Scheduler crashes and restarts here ... 2022-07-13 10:42:59 {scheduler_job.py:1285} Reset the following 8 orphaned TaskInstances: <TaskInstance: XXXXX scheduled__2022-07-12T16:00:00+00:00 [running]> .... 2022-07-13 10:43:00 {scheduler_job.py:353} Level=INFO Message=10 tasks up for execution: <TaskInstance: XXXXX scheduled__2022-07-12T16:00:00+00:00 [scheduled]> .... .... 2022-07-13 10:43:00 {scheduler_job.py:504} INFO Setting the following tasks to queued state: <TaskInstance: XXXXX scheduled__2022-07-12T16:00:00+00:00 [scheduled]> ... ``` I don't know what other logs that might be relevant. ### Operating System Debian GNU/Linux 11 (bullseye) ### Versions of Apache Airflow Providers apache-airflow==2.3.2 apache-airflow-client==2.1.0 apache-airflow-providers-celery==3.0.0 apache-airflow-providers-cncf-kubernetes==4.0.2 apache-airflow-providers-docker==3.0.0 apache-airflow-providers-ftp==2.1.2 apache-airflow-providers-http==2.1.2 apache-airflow-providers-imap==2.2.3 apache-airflow-providers-postgres==5.0.0 apache-airflow-providers-sqlite==2.1.3 ### Deployment Other Docker-based deployment ### Deployment details Pgsql as database ### Anything else Almost every time scheduler restarts ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
