[ 
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17888881#comment-17888881
 ] 

wangzhihui commented on YUNIKORN-2860:
--------------------------------------

hi, [~wilfreds] 
I have an idea. Suppose the Kubernetes cluster uses Yunikorn as the sole 
scheduler. We don't need to create Pods for placeholders, but instead add 
placeholder records in the Yunikorn core Node. It will simplify the 
Gang-Scheduling process, and it can solve this problem.

> submit gang applications Simultaneously  may cause unexpected pending apps 
> ---------------------------------------------------------------------------
>
>                 Key: YUNIKORN-2860
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>    Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
>            Reporter: shawn
>            Assignee: Qi Zhu
>            Priority: Major
>         Attachments: image-2024-09-11-15-41-12-142.png, 
> image-2024-09-11-15-42-07-739.png, image-2024-09-13-15-33-13-964.png, 
> image-2024-09-13-15-33-19-380.png, image-2024-09-13-15-35-26-177.png, 
> state-dump.txt, yunikorn-scheduler.txt
>
>
>   
>   I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get 
> pending, while two pgs get running, that's not expected.
>  It can be reproduced as follows:
> queues
>       1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n 
> yunikorn
>  * queues.yaml
> {code:java}
> partitions:
>   - name: default
>     queues:
>       - name: root
>         queues:
>           - name: my-dev
>             submitacl: "*"
>             resources:
>               guaranteed: { memory: 1G, vcore: 1 }
>               max: { memory: 2G, vcore: 2 }{code}
>          2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
>   name: gang-scheduling-job-example1
> spec:
>   completions: 2
>   parallelism: 2
>   template:
>     metadata:
>       labels:
>         app: sleep
>         applicationId: "gang-scheduling-job-example1"
>         queue: root.my-dev
>       annotations:
>         yunikorn.apache.org/task-group-name: task-group-example-0
>         yunikorn.apache.org/task-groups: |-
>           [{
>               "name": "task-group-example-0",
>               "minMember": 2,
>               "minResource": {
>                 "cpu": "1",
>                 "memory": "1G"
>               },
>               "nodeSelector": {},
>               "tolerations": [],
>               "affinity": {}
>           }]
>     spec:
>       schedulerName: yunikorn
>       restartPolicy: Never
>       containers:
>         - name: sleep30
>           image: "nginx:latest"
>           command: ["sleep", "999999999"]
>           resources:
>             requests:
>               cpu: "1"
>               memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always 
> reproducible)
> !image-2024-09-11-15-41-12-142.png!
>  
> app state as follows
> !image-2024-09-11-15-42-07-739.png|width=754,height=280!
> full state dump as state-dump.txt, yunikorn scheduler logs are in 
> yunikorn-scheduler.txt
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to