[
https://issues.apache.org/jira/browse/YUNIKORN-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shawn updated YUNIKORN-2860:
----------------------------
Description:
I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get pending,
while two pgs get running, that's not expected.
It can be reproduced as follows:
queues
1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n
yunikorn
* queues.yaml
{code:java}
partitions:
- name: default
queues:
- name: root
queues:
- name: my-dev
submitacl: "*"
resources:
guaranteed: { memory: 1G, vcore: 1 }
max: { memory: 2G, vcore: 2 }{code}
2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
gang-scheduling-job-example1-4.yaml only differ in name and applicationId
{code:java}
apiVersion: batch/v1
kind: Job
metadata:
name: gang-scheduling-job-example1
spec:
completions: 2
parallelism: 2
template:
metadata:
labels:
app: sleep
applicationId: "gang-scheduling-job-example1"
queue: root.my-dev
annotations:
yunikorn.apache.org/task-group-name: task-group-example-0
yunikorn.apache.org/task-groups: |-
[{
"name": "task-group-example-0",
"minMember": 2,
"minResource": {
"cpu": "1",
"memory": "1G"
},
"nodeSelector": {},
"tolerations": [],
"affinity": {}
}]
spec:
schedulerName: yunikorn
restartPolicy: Never
containers:
- name: sleep30
image: "nginx:latest"
command: ["sleep", "999999999"]
resources:
requests:
cpu: "1"
memory: "1G" {code}
finally,kubectl get pods -n default gets unexpected result(not always
reproducible)
!http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/HCBXYGQ3ABQGY?|width=1032!
queues web ui
!http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/JKIV4GQ3ACADC?|width=1493!
app state
!http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/7CIH2GQ3ABQEK?|width=1459!
was:
I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get pending,
while two pgs get running, that's not expected.
It can be reproduced as follows:
queues
1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n
yunikorn
* queues.yaml
{code:java}
partitions:
- name: default
queues:
- name: root
queues:
- name: my-dev
submitacl: "*"
resources:
guaranteed: { memory: 1G, vcore: 1 }
max: { memory: 2G, vcore: 2 }{code}
2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
gang-scheduling-job-example1-4.yaml only differ in name and applicationId
{code:java}
apiVersion: batch/v1
kind: Job
metadata:
name: gang-scheduling-job-example1
spec:
completions: 2
parallelism: 2
template:
metadata:
labels:
app: sleep
applicationId: "gang-scheduling-job-example1"
queue: root.my-dev
annotations:
yunikorn.apache.org/task-group-name: task-group-example-0
yunikorn.apache.org/task-groups: |-
[{
"name": "task-group-example-0",
"minMember": 2,
"minResource": {
"cpu": "1",
"memory": "1G"
},
"nodeSelector": {},
"tolerations": [],
"affinity": {}
}]
spec:
schedulerName: yunikorn
restartPolicy: Never
containers:
- name: sleep30
image: "nginx:latest"
command: ["sleep", "999999999"]
resources:
requests:
cpu: "1"
memory: "1G" {code}
finally,kubectl get pods -n default gets unexpected result(not always
reproducible)
!http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/HCBXYGQ3ABQGY?|width=1032,align=left!
!http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/JKIV4GQ3ACADC?|width=1493,align=left!
!http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/7CIH2GQ3ABQEK?|width=1459,align=left!
> submit gang applications Simultaneously may cause unexpected pending apps
> ---------------------------------------------------------------------------
>
> Key: YUNIKORN-2860
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2860
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Affects Versions: 1.3.0, 1.4.0, 1.5.0, 1.5.1, 1.5.2
> Reporter: shawn
> Priority: Major
>
>
> I Simultaneously submit 4 gang apps to yunikorn,sometimes 4 apps get
> pending, while two pgs get running, that's not expected.
> It can be reproduced as follows:
> queues
> 1.kubectl create configmap yunikorn-configs --from-file=queues.yaml -n
> yunikorn
> * queues.yaml
> {code:java}
> partitions:
> - name: default
> queues:
> - name: root
> queues:
> - name: my-dev
> submitacl: "*"
> resources:
> guaranteed: { memory: 1G, vcore: 1 }
> max: { memory: 2G, vcore: 2 }{code}
> 2.Simultaneously submit gang-scheduling-job-example1-4.yaml, while
> gang-scheduling-job-example1-4.yaml only differ in name and applicationId
> {code:java}
> apiVersion: batch/v1
> kind: Job
> metadata:
> name: gang-scheduling-job-example1
> spec:
> completions: 2
> parallelism: 2
> template:
> metadata:
> labels:
> app: sleep
> applicationId: "gang-scheduling-job-example1"
> queue: root.my-dev
> annotations:
> yunikorn.apache.org/task-group-name: task-group-example-0
> yunikorn.apache.org/task-groups: |-
> [{
> "name": "task-group-example-0",
> "minMember": 2,
> "minResource": {
> "cpu": "1",
> "memory": "1G"
> },
> "nodeSelector": {},
> "tolerations": [],
> "affinity": {}
> }]
> spec:
> schedulerName: yunikorn
> restartPolicy: Never
> containers:
> - name: sleep30
> image: "nginx:latest"
> command: ["sleep", "999999999"]
> resources:
> requests:
> cpu: "1"
> memory: "1G" {code}
> finally,kubectl get pods -n default gets unexpected result(not always
> reproducible)
> !http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/HCBXYGQ3ABQGY?|width=1032!
> queues web ui
> !http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/JKIV4GQ3ACADC?|width=1493!
> app state
> !http://www.kdocs.cn/api/v3/office/copy/dHZnb0t1QXY5SXVjY0llQW5BcHJabWRzcGNxYm1NMUMyVmo4Mk4yYnhrWFhkZlRCamV6L1h6bHNqOEtyanc3QmpKU04xMDY5WHBTcEhMT2FxbnFGSWU1dVFJMGh1V2x4SXNXRU1KU3dQY2xxSzE4dW5QbkZ3NE5hcWtMOWZPVEtnM2lFRGhLTWNLYUR0NzRFUmNmRHZ2QjNJeTU3NHoyZm96SjNYSWFhc0srbVl4a1hjclJTT1JZVnphaEplSmVibGxXZjgyU0NoNlBpSjV4N2dyc2dIdFFUK0ppbGVrS1VueWxWWEFMd2xqUGpFUUlYSVNqNmxZRjBLY3RwL2pUdHJPbHJ1c1hhNE1vPQ==/attach/object/7CIH2GQ3ABQEK?|width=1459!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]