Re: [D] Intermittent SIGTERM running on K8S [airflow]

via GitHub Sat, 22 Feb 2025 04:22:35 -0800


GitHub user arkadiusz-bach edited a comment on the discussion: Intermittent 
SIGTERM running on K8S

@b32n , @mschueler do you have `.Values.workers.safeToEvict` set to `false` ?
Cluster Autoscaler enabled in k8s?

Not so long ago by default it was set to 'true', 8 months ago it was changed to
false:
[here](https://github.com/apache/airflow/commit/4c9f12d2bce5af0c629e9b9e9916cd525817931d)

- The pod may receive SIGTERM when Autoscaler is trying to remove the node
because it is underutilized.
- It may behave differently on different clouds:
- Azure - Autoscaler has its own terminationGracePeriod
(`max-graceful-termination-sec`, see this
[link](max-graceful-termination-sec)). It ignores terminationGracePeriod
defined on the POD. By default it is set to 600seconds. So when autoscaler is
trying to downscale a node it sends SIGTERM to pods in a node, if pods could
not shutdown in 600seconds they are stopped, even though you could have
terminationGracePerion on pod equal to 7200seconds
- EKS - I couldn't find something similar to `max-graceful-termination-sec`
within EKS Autoscaler
[settings](https://docs.aws.amazon.com/eks/latest/best-practices/cas.html#_additional_parameters),
probably it is respecting terminationGracePeriod defined on the pod. The
question is how the task pod would behave if it gets SIGTERM? What is the
terminationGracePeriod on the task pod, sufficient for it to finish the task?
- GKE - didn't check

GitHub link:
https://github.com/apache/airflow/discussions/32543#discussioncomment-12285453

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Intermittent SIGTERM running on K8S [airflow]

Reply via email to