GitHub user arkadiusz-bach edited a comment on the discussion: Intermittent SIGTERM running on K8S
@b32n , @mschueler do you have `.Values.workers.safeToEvict` set to `false` ? Cluster Autoscaler enabled in k8s? Not so long ago by default it was set to 'true', 8 months ago it was changed to false: [here](https://github.com/apache/airflow/commit/4c9f12d2bce5af0c629e9b9e9916cd525817931d) - The pod may receive SIGTERM when Autoscaler is trying to remove the node because it is underutilized. - It may behave differently on different clouds: - Azure - Autoscaler has its own terminationGracePeriod (`max-graceful-termination-sec`, see this [link](max-graceful-termination-sec)). It ignores terminationGracePeriod defined on the POD. By default it is set to 600seconds. So when autoscaler is trying to downscale a node it sends SIGTERM to pods in a node, if pods could not shutdown in 600seconds they are stopped, even though you could have terminationGracePerion on pod equal to 7200seconds - EKS - I couldn't find something similar to `max-graceful-termination-sec` within EKS Autoscaler [settings](https://docs.aws.amazon.com/eks/latest/best-practices/cas.html#_additional_parameters), probably it is respecting terminationGracePeriod defined on the pod. The question is how the task pod would behave if it gets SIGTERM? What is the terminationGracePeriod on the task pod, sufficient for it to finish the task? - GKE - didn't check GitHub link: https://github.com/apache/airflow/discussions/32543#discussioncomment-12285453 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
