wangzhihui created YUNIKORN-2926:
------------------------------------
Summary: The Pod using gang scheduling is stuck in the Pending
state
Key: YUNIKORN-2926
URL: https://issues.apache.org/jira/browse/YUNIKORN-2926
Project: Apache YuniKorn
Issue Type: Bug
Components: core - scheduler
Reporter: wangzhihui
Attachments: image-2024-10-15-11-54-33-458.png, image.png
desc:
The reason for the real allocation is larger than all placeholder,Then release
all allocations。Causing all Pods is Pending state.
!image-2024-10-15-11-54-33-458.png!
!image.png!
{code:java}
// code placeholder
apiVersion: batch/v1
kind: Job
metadata:
name: simple-gang-job
spec:
completions: 2
parallelism: 2
template:
metadata:
labels:
app: sleep
applicationId: "simple-gang-job"
queue: root.default
annotations:
yunikorn.apache.org/schedulingPolicyParameters:
"placeholderTimeoutInSeconds=30 gangSchedulingStyle=Hard"
yunikorn.apache.org/task-group-name: task-group-example
yunikorn.apache.org/task-groups: |-
[{
"name": "task-group-example",
"minMember": 1,
"minResource": {
"cpu": "100m",
"memory": "50M"
},
"nodeSelector": {},
"tolerations": [],
"affinity": {},
"topologySpreadConstraints": []
}]
spec:
schedulerName: yunikorn
restartPolicy: Never
containers:
- name: sleep30
image: "alpine:latest"
command: ["sleep", "99999999"]
resources:
requests:
cpu: "200m"
memory: "50M" {code}
solution:
If the app is in Hard mode, it will transition to a Failing state. If it is in
Soft mode, it will transition to a Resuming state.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]