GitHub user arkadiusz-bach edited a comment on the discussion: Intermittent 
SIGTERM running on K8S

@b32n , @mschueler do you have `.Values.workers.safeToEvict` set to `false` ? 
Cluster Autoscaler enabled in k8s?

Not so long ago by default it was set to 'true', 8 months ago it was changed to 
false: 
[here](https://github.com/apache/airflow/commit/4c9f12d2bce5af0c629e9b9e9916cd525817931d)

- The pod may receive SIGTERM when Autoscaler is trying to remove the node 
because it is underutilized.  
- It may behave differently on different clouds:
  - Azure - Autoscaler has its own terminationGracePeriod 
(`max-graceful-termination-sec`, see this 
[link](max-graceful-termination-sec)). It ignores terminationGracePeriod 
defined on the POD. By default it is set to 600seconds. So when autoscaler is 
trying to downscale a node it sends SIGTERM to pods in a node, if pods could 
not shutdown in 600seconds they are stopped, even though you could have 
terminationGracePerion on pod equal to 7200seconds
  - EKS - I couldn't find something similar to `max-graceful-termination-sec` 
within EKS Autoscaler 
[settings](https://docs.aws.amazon.com/eks/latest/best-practices/cas.html#_additional_parameters),
 probably it is respecting terminationGracePeriod defined on the pod. The 
question is how the task pod would behave if it gets SIGTERM? What is the 
terminationGracePeriod on the task pod, sufficient for it to finish the task?
  - GKE - didn't check 

GitHub link: 
https://github.com/apache/airflow/discussions/32543#discussioncomment-12285453

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to