sabarnwal opened a new issue, #29800: URL: https://github.com/apache/airflow/issues/29800
### Apache Airflow version Other Airflow 2 version (please specify below) ### What happened We have deployed airflow 2.3.3 using helm on our k8s cluster. We are using kubernetes executor for the tasks. Issue is, if our scheduler pod crashes, the running pods for those tasks are marked success (successful termination of the pod) and underlying tasks are failed. ### What you think should happen instead According to the doc, In cases of scheduler crashes, the scheduler will recover its state using the watcher’s resourceVersion. When monitoring the Kubernetes cluster’s watcher thread, each event has a monotonically rising number called a resourceVersion. Every time the executor reads a resourceVersion, the executor stores the latest value in the backend database. Because the resourceVersion is stored, the scheduler can restart and continue reading the watcher stream from where it left off. Since the tasks are run independently of the executor and report results directly to the database, scheduler failures will not lead to task failures or re-runs. ### How to reproduce On 2.3.3, Trigger a dag manually. After the task has been started, Kill the scheduler pod manually. The running tasks will also be killed, with a sigterm error. ### Operating System PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==4.0.0 apache-airflow-providers-apache-spark==3.0.0 apache-airflow-providers-celery==3.0.0 apache-airflow-providers-cncf-kubernetes==4.1.0 apache-airflow-providers-docker==3.0.0 apache-airflow-providers-elasticsearch==4.0.0 apache-airflow-providers-ftp==3.0.0 apache-airflow-providers-google==8.1.0 apache-airflow-providers-grpc==3.0.0 apache-airflow-providers-hashicorp==3.0.0 apache-airflow-providers-http==3.0.0 apache-airflow-providers-imap==3.0.0 apache-airflow-providers-microsoft-azure==4.0.0 apache-airflow-providers-mongo==3.0.0 apache-airflow-providers-mysql==3.0.0 apache-airflow-providers-odbc==3.0.0 apache-airflow-providers-postgres==5.0.0 apache-airflow-providers-redis==3.0.0 apache-airflow-providers-sendgrid==3.0.0 apache-airflow-providers-sftp==3.0.0 apache-airflow-providers-slack==5.0.0 apache-airflow-providers-sqlite==3.0.0 apache-airflow-providers-ssh==3.0.0 ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else Everytime the scheduler pod goes down. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
