[jira] [Updated] (YUNIKORN-2926) The Pod using gang scheduling is stuck in the Pending state

wangzhihui (Jira) Tue, 15 Oct 2024 00:37:17 -0700


     [ 
https://issues.apache.org/jira/browse/YUNIKORN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


wangzhihui updated YUNIKORN-2926:
---------------------------------
    Description: 
desc：
 The reason for the real allocation is larger than all placeholder，Then release 
all allocations。Causing all Pods is Pending state.

!image-2024-10-15-11-54-33-458.png!
!image.png!
{code:java}
// code placeholder
apiVersion: batch/v1
kind: Job
metadata:
  name: simple-gang-job
spec:
  completions: 2
  parallelism: 2
  template:
    metadata:
      labels:
        app: sleep
        applicationId: "simple-gang-job"
        queue: root.default
      annotations:
        yunikorn.apache.org/schedulingPolicyParameters: 
"placeholderTimeoutInSeconds=30 gangSchedulingStyle=Hard"
        yunikorn.apache.org/task-group-name: task-group-example
        yunikorn.apache.org/task-groups: |-
          [{
              "name": "task-group-example",
              "minMember": 2,
              "minResource": {
                "cpu": "100m",
                "memory": "50M"
              },
              "nodeSelector": {},
              "tolerations": [],
              "affinity": {},
              "topologySpreadConstraints": []
          }]
    spec:
      schedulerName: yunikorn
      restartPolicy: Never
      containers:
        - name: sleep30
          image: "alpine:latest"
          command: ["sleep", "99999999"]
          resources:
            requests:
              cpu: "200m"
              memory: "50M" {code}
solution：
If the app is in Hard mode, it will transition to a Failing state. If it is in 
Soft mode, it will transition to a Resuming state.

  was:
desc：
 The reason for the real allocation is larger than all placeholder，Then release 
all allocations。Causing all Pods is Pending state.

!image-2024-10-15-11-54-33-458.png!
!image.png!
{code:java}
// code placeholder
apiVersion: batch/v1
kind: Job
metadata:
  name: simple-gang-job
spec:
  completions: 2
  parallelism: 2
  template:
    metadata:
      labels:
        app: sleep
        applicationId: "simple-gang-job"
        queue: root.default
      annotations:
        yunikorn.apache.org/schedulingPolicyParameters: 
"placeholderTimeoutInSeconds=30 gangSchedulingStyle=Hard"
        yunikorn.apache.org/task-group-name: task-group-example
        yunikorn.apache.org/task-groups: |-
          [{
              "name": "task-group-example",
              "minMember": 1,
              "minResource": {
                "cpu": "100m",
                "memory": "50M"
              },
              "nodeSelector": {},
              "tolerations": [],
              "affinity": {},
              "topologySpreadConstraints": []
          }]
    spec:
      schedulerName: yunikorn
      restartPolicy: Never
      containers:
        - name: sleep30
          image: "alpine:latest"
          command: ["sleep", "99999999"]
          resources:
            requests:
              cpu: "200m"
              memory: "50M" {code}
solution：
If the app is in Hard mode, it will transition to a Failing state. If it is in 
Soft mode, it will transition to a Resuming state.


> The Pod using gang scheduling is stuck in the Pending state
> -----------------------------------------------------------
>
>                 Key: YUNIKORN-2926
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2926
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: wangzhihui
>            Priority: Minor
>             Fix For: 1.5.0
>
>         Attachments: image-2024-10-15-11-54-33-458.png, image.png
>
>
> desc：
>  The reason for the real allocation is larger than all placeholder，Then 
> release all allocations。Causing all Pods is Pending state.
> !image-2024-10-15-11-54-33-458.png!
> !image.png!
> {code:java}
> // code placeholder
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: simple-gang-job
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "simple-gang-job"
>         queue: root.default
>       annotations:
>         yunikorn.apache.org/schedulingPolicyParameters: 
> "placeholderTimeoutInSeconds=30 gangSchedulingStyle=Hard"
>         yunikorn.apache.org/task-group-name: task-group-example
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example",
>               "minMember": 2,
>               "minResource": {
>                 "cpu": "100m",
>                 "memory": "50M"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {},
>               "topologySpreadConstraints": []
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "alpine:latest"
>           command: ["sleep", "99999999"]
>           resources:
>             requests:
>               cpu: "200m"
>               memory: "50M" {code}
> solution：
> If the app is in Hard mode, it will transition to a Failing state. If it is 
> in Soft mode, it will transition to a Resuming state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (YUNIKORN-2926) The Pod using gang scheduling is stuck in the Pending state

Reply via email to