[jira] [Created] (YUNIKORN-1616) Terminating scheduler pods still actively scheduling when replacement pod launches

Eli Schiff (Jira) Fri, 03 Mar 2023 06:41:12 -0800

Eli Schiff created YUNIKORN-1616:
------------------------------------

             Summary: Terminating scheduler pods still actively scheduling when 
replacement pod launches
                 Key: YUNIKORN-1616
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1616
             Project: Apache YuniKorn
          Issue Type: Bug
            Reporter: Eli Schiff
            Assignee: Eli Schiff



If a yunikorn scheduler pod gets shut down for any reason (EX: manually 
deleted) the pod goes into a terminating state. After maybe 30 seconds the pod 
is fully shut down. However, once the pod goes into that terminating state, the 
replica set from the k8s deployment immediately creates a new pod. This can 
cause race conditions where both pods are trying to schedule for a short period 
of time. 

I have noticed errors like `failed to create placeholder pod \{"error": "pods 
\"tg-spark-executor-abcdefg-0\" already exists"}` caused by both scheduler pods 
attempting to make this placeholder pod at once. I believe this has also caused 
pods to get stuck pending when they should have been scheduled.

 

There is currently discussion about adding a way to tell k8s deployments to not 
allow new pods to start before the old pod is fully shut down. 
[https://github.com/kubernetes/kubernetes/issues/115844]

 

In the meantime the solutions seems to be to switch to a statefulset.

[https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment]
 

> *Note:* This will only guarantee Pod termination previous to creation for 
>upgrades. If you upgrade a Deployment, all Pods of the old revision will be 
>terminated immediately. Successful removal is awaited before any Pod of the 
>new revision is created. If you manually delete a Pod, the lifecycle is 
>controlled by the ReplicaSet and the replacement will be created immediately 
>(even if the old Pod is still in a Terminating state). If you need an "at 
>most" guarantee for your Pods, you should consider using a 
>[StatefulSet|https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/].

 

>From what I can tell, the use of a StatefulSet here is a pretty smooth 
>transition, but I am not sure if there are wider issues or implications to 
>this change that I do not know about.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (YUNIKORN-1616) Terminating scheduler pods still actively scheduling when replacement pod launches

Reply via email to