paramjeet01 opened a new issue, #39791:
URL: https://github.com/apache/airflow/issues/39791

   ### Apache Airflow version
   
   main (development)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.8.3
   
   ### What happened?
   
   **Issue :** 
   When a worker pod is killed , it is expected not to kill the task pods when 
`reattach_on_restart` is configured as True.
   
   **Case:** 
   Our current configuration of airflow includes usage of aws EC2 spot 
instances so we can say that workers are expected to be killed occasionally 
when the EC2 instance has be interrupted. When the EC2 instance has been 
removed from kubernetes nodes , it sends the SIGTERM to the worker pod which 
invokes the below `_execute_task_with_callbacks` method and `on-kill` method is 
called which will kill the task pod. So , `reattach_on_restart` won't work as 
expected since the task pod is deleted when a worker pod is killed.
   
https://github.com/apache/airflow/blob/2d53c1089f78d8d1416f51af60e1e0354781c661/airflow/models/taskinstance.py#L2592-L2613
   
   
   
   ### What you think should happen instead?
   
   The task pod should remain active when a worker pod is terminated if 
reattach_on_restart is set to True.
   
   ### How to reproduce
   
   - Create a task that uses kuberenetes pod operator.
   - Set `reattach_on_restart` to True in the task.
   - Delete the worker pod which will result in deletion of worker pod and the 
task pod.
   
   ### Operating System
   
   Amazon Linux 2
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-cncf-kubernetes==8.2.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   **Suggestion :** 
   The `on_kill`  method should send another parameter which states who called 
this method (Is it called by airflow UI clear task button or due to SIGTERM 
signal ) so based on it we can keep the pod or delete the pod.
   
https://github.com/apache/airflow/blob/fe4605a10e26f1b8a180979ba5765d1cb7fb0111/airflow/providers/cncf/kubernetes/operators/pod.py#L989-L1003
   
   If someone helps me on the taskinstance.py code to add the above 
recommendation , I'll be able to refactor the on_kill method.
   
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to