[
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888881#comment-17888881
]
wangzhihui commented on YUNIKORN-2860:
--------------------------------------
hi, [~wilfreds]
I have an idea. Suppose the Kubernetes cluster uses Yunikorn as the sole
scheduler. We don't need to create Pods for placeholders, but instead add
placeholder records in the Yunikorn core Node. It will simplify the
Gang-Scheduling process, and it can solve this problem.
> submit gang applications Simultaneously may cause unexpected pending apps
> ---------------------------------------------------------------------------
>
> Key: YUNIKORN-2860
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
> Reporter: shawn
> Assignee: Qi Zhu
> Priority: Major
> Attachments: image-2024-09-11-15-41-12-142.png,
> image-2024-09-11-15-42-07-739.png, image-2024-09-13-15-33-13-964.png,
> image-2024-09-13-15-33-19-380.png, image-2024-09-13-15-35-26-177.png,
> state-dump.txt, yunikorn-scheduler.txt
>
>
>
> I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get
> pending, while two pgs get running, that's not expected.
> It can be reproduced as follows:
> queues
> 1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n
> yunikorn
> * queues.yaml
> {code:java}
> partitions:
> - name: default
> queues:
> - name: root
> queues:
> - name: my-dev
> submitacl: "*"
> resources:
> guaranteed: { memory: 1G, vcore: 1 }
> max: { memory: 2G, vcore: 2 }{code}
> 2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
> name: gang-scheduling-job-example1
> spec:
> completions: 2
> parallelism: 2
> template:
> metadata:
> labels:
> app: sleep
> applicationId: "gang-scheduling-job-example1"
> queue: root.my-dev
> annotations:
> yunikorn.apache.org/task-group-name: task-group-example-0
> yunikorn.apache.org/task-groups: |-
> [{
> "name": "task-group-example-0",
> "minMember": 2,
> "minResource": {
> "cpu": "1",
> "memory": "1G"
> },
> "nodeSelector": {},
> "tolerations": [],
> "affinity": {}
> }]
> spec:
> schedulerName: yunikorn
> restartPolicy: Never
> containers:
> - name: sleep30
> image: "nginx:latest"
> command: ["sleep", "999999999"]
> resources:
> requests:
> cpu: "1"
> memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always
> reproducible)
> !image-2024-09-11-15-41-12-142.png!
>
> app state as follows
> !image-2024-09-11-15-42-07-739.png|width=754,height=280!
> full state dump as state-dump.txt, yunikorn scheduler logs are in
> yunikorn-scheduler.txt
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]