Bisk1 opened a new issue, #33661: URL: https://github.com/apache/airflow/issues/33661
### Apache Airflow version 2.7.0 ### What happened Recently we added some automation to clearing Airflow tasks so we use this feature a lot. It also often happens when tasks are in RUNNING state, which means that they go into RESTARTING state. We noticed that a lot of those tasks get stuck in RESTARTING state. Our Airflow infrastructure runs in an environment where any process can get suddenly killed without graceful shutdown. We run Airflow on GKE but I managed to reproduce this behaviour on local environment with SequentialExecutor. See steps to reproduced below for details. ### What you think should happen instead Tasks should get cleaned after scheduler restart and eventually get scheduled and executed. ### How to reproduce After some code investigation, I reproduced this kind of behaviour on local environment and it seems that RESTARTING tasks are only properly handled if the original restarting task is gracefully shut down so it can mark task as UP_FOR_RETRY or at least there is a healthy scheduler to do it if they fail for any other reason. The problem is with the following scenario: 1. Task is initially in RUNNING state. 2. Scheduler process dies suddenly. 3. The task process also dies suddenly. 4. Clear command is issues so the task moves to RESTARTING state. 5. From now on, even if we restart scheduler, the task will never get scheduled or change its state. It needs to have its state manually fixed, e.g. by clearing it again. A recording of steps to reproduce on local environment: https://vimeo.com/857192666?share=copy ### Operating System MacOS Ventura 13.4.1 ### Versions of Apache Airflow Providers N/A ### Deployment Official Apache Airflow Helm Chart ### Deployment details N/A ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
