[
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890060#comment-17890060
]
wangzhihui commented on YUNIKORN-2860:
--------------------------------------
Thank you for your answer. I need to think more about it.
> submit gang applications Simultaneously may cause unexpected pending apps
> ---------------------------------------------------------------------------
>
> Key: YUNIKORN-2860
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
> Reporter: shawn
> Assignee: Qi Zhu
> Priority: Major
> Attachments: image-2024-09-11-15-41-12-142.png,
> image-2024-09-11-15-42-07-739.png, image-2024-09-13-15-33-13-964.png,
> image-2024-09-13-15-33-19-380.png, image-2024-09-13-15-35-26-177.png,
> state-dump.txt, yunikorn-scheduler.txt
>
>
>
> I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get
> pending, while two pgs get running, that's not expected.
> It can be reproduced as follows:
> queues
> 1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n
> yunikorn
> * queues.yaml
> {code:java}
> partitions:
> - name: default
> queues:
> - name: root
> queues:
> - name: my-dev
> submitacl: "*"
> resources:
> guaranteed: { memory: 1G, vcore: 1 }
> max: { memory: 2G, vcore: 2 }{code}
> 2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
> name: gang-scheduling-job-example1
> spec:
> completions: 2
> parallelism: 2
> template:
> metadata:
> labels:
> app: sleep
> applicationId: "gang-scheduling-job-example1"
> queue: root.my-dev
> annotations:
> yunikorn.apache.org/task-group-name: task-group-example-0
> yunikorn.apache.org/task-groups: |-
> [{
> "name": "task-group-example-0",
> "minMember": 2,
> "minResource": {
> "cpu": "1",
> "memory": "1G"
> },
> "nodeSelector": {},
> "tolerations": [],
> "affinity": {}
> }]
> spec:
> schedulerName: yunikorn
> restartPolicy: Never
> containers:
> - name: sleep30
> image: "nginx:latest"
> command: ["sleep", "999999999"]
> resources:
> requests:
> cpu: "1"
> memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always
> reproducible)
> !image-2024-09-11-15-41-12-142.png!
>
> app state as follows
> !image-2024-09-11-15-42-07-739.png|width=754,height=280!
> full state dump as state-dump.txt, yunikorn scheduler logs are in
> yunikorn-scheduler.txt
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]