sabarnwal opened a new issue, #29800:
URL: https://github.com/apache/airflow/issues/29800

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   We have deployed airflow 2.3.3 using helm on our k8s cluster. We are using 
kubernetes executor for the tasks.
   Issue is, if our scheduler pod crashes, the running pods for those tasks are 
marked success (successful termination of the pod) and underlying tasks are 
failed.
   
   
   
   ### What you think should happen instead
   
   According to the doc, 
   
   In cases of scheduler crashes, the scheduler will recover its state using 
the watcher’s resourceVersion.
   
   When monitoring the Kubernetes cluster’s watcher thread, each event has a 
monotonically rising number called a resourceVersion. Every time the executor 
reads a resourceVersion, the executor stores the latest value in the backend 
database. Because the resourceVersion is stored, the scheduler can restart and 
continue reading the watcher stream from where it left off. Since the tasks are 
run independently of the executor and report results directly to the database, 
scheduler failures will not lead to task failures or re-runs.
   
   ### How to reproduce
   
   On 2.3.3, Trigger a dag manually. After the task has been started, Kill the 
scheduler pod manually. The running tasks will also be killed, with a sigterm 
error. 
   
   ### Operating System
   
   PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux"
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==4.0.0
   apache-airflow-providers-apache-spark==3.0.0
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-cncf-kubernetes==4.1.0
   apache-airflow-providers-docker==3.0.0
   apache-airflow-providers-elasticsearch==4.0.0
   apache-airflow-providers-ftp==3.0.0
   apache-airflow-providers-google==8.1.0
   apache-airflow-providers-grpc==3.0.0
   apache-airflow-providers-hashicorp==3.0.0
   apache-airflow-providers-http==3.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-microsoft-azure==4.0.0
   apache-airflow-providers-mongo==3.0.0
   apache-airflow-providers-mysql==3.0.0
   apache-airflow-providers-odbc==3.0.0
   apache-airflow-providers-postgres==5.0.0
   apache-airflow-providers-redis==3.0.0
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-sftp==3.0.0
   apache-airflow-providers-slack==5.0.0
   apache-airflow-providers-sqlite==3.0.0
   apache-airflow-providers-ssh==3.0.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   Everytime the scheduler pod goes down.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to