[ 
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shawn updated YUNIKORN-2860:
----------------------------
    Attachment: state-dump.txt

> submit gang applications Simultaneously  may cause unexpected pending apps 
> ---------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2860
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
>            Reporter: shawn
>            Priority: Major
>         Attachments: image-2024-09-11-15-41-12-142.png, 
> image-2024-09-11-15-42-07-739.png, state-dump.txt
>
>
>   
>   I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get 
> pending, while two pgs get running, that's not expected.
>  It can be reproduced as follows:
> queues
>       1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n 
> yunikorn
>  * queues.yaml
> {code:java}
> partitions:
>   - name: default
>     queues:
>       - name: root
>         queues:
>           - name: my-dev
>             submitacl: "*"
>             resources:
>               guaranteed: { memory: 1G, vcore: 1 }
>               max: { memory: 2G, vcore: 2 }{code}
>          2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: gang-scheduling-job-example1
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "gang-scheduling-job-example1"
>         queue: root.my-dev
>       annotations:
>         yunikorn.apache.org/task-group-name: task-group-example-0
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example-0",
>               "minMember": 2,
>               "minResource": {
>                 "cpu": "1",
>                 "memory": "1G"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {}
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "nginx:latest"
>           command: ["sleep", "999999999"]
>           resources:
>             requests:
>               cpu: "1"
>               memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always 
> reproducible)
> queues web ui as follows
> app state as follows



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to