[jira] [Comment Edited] (YUNIKORN-2895) Don't add duplicated allocation to node when the allocation ask fails

Olivier Sevin (Jira) Fri, 18 Oct 2024 09:09:17 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890992#comment-17890992
 ]


Olivier Sevin edited comment on YUNIKORN-2895 at 10/18/24 3:58 PM:
-------------------------------------------------------------------

"4. Do you also see orphan allocations for 1.6.0? This Jira want to handle this 
according to the slack channel posting."
 
We do see orphan allocations (but no longer the negative resource check) even 
after applying  YUNIKORN-2910.
 
When we create a lot of pods at once we sometimes get OutOfCpu errors where it 
looks like the pods are being scheduled on machines where they don't fit. We 
also end up with a bunch of pending pods because the scheduler thinks empty 
machines have something on them, I think these are related because the OutOfCpu 
pods show up as orphan allocations on nodes (the wrong nodes) that should be 
empty. Below is a dump where we have ~50 pending pods for hours (they only 
schedule again when we restart yunikorn). The cluster-autoscaler won't do 
anything for us because there are plenty of machines that they'll fit on 
already.
 
For example in the attached dump this node should be empty except for 
daemonsets: gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-xtqc

It claims to have "dataops0000000002wcv-n0-0-driver" allocated on it, which 
isn't true, that allocation can be found in the logs under "Orphan allocation 
on node check" on the status page

That pod was never scheduled on that node. It was assigned to 
gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8 (and got an OutOfCpu error)
Logs for dataops0000000002wcv-n0-0-driver:
INFO 2024-10-18T04:59:29.227546628Z 
data-operations-production/dataops0000000002wcv-n0-0-driver is queued and 
waiting for allocation
WARNING 2024-10-18T04:59:29.227546628Z Pod has inconsistent application 
metadata and may be rejected in a future YuniKorn release: label applicationId: 
"dataops0000000002wcv" doesn't match label spark-app-selector: 
"spark-da5f3013ea454b9c9d2056704dd34591"
INFO 2024-10-18T04:59:29.227546628Z Unschedulable request 
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s) 
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T04:59:29.227546628Z Task 
data-operations-production/dataops0000000002wcv-n0-0-driver is pending for the 
requested resources become available
INFO 2024-10-18T04:59:44.650502333Z Unschedulable request 
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s) 
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T04:59:56.134091Z Unschedulable request 
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s) 
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T05:00:09Z pod triggered scale-up: 
[\\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-b/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-grp
 4->16 (max: 1000)} 
\\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-a/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-52438d9c-grp
 4->15 (max: 1000)} 
\\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-c/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-149830b3-grp
 4->15 (max: 1000)}]
INFO 2024-10-18T05:00:58.162394Z Unschedulable request 
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s) 
didn't match Pod's node affinity/selector (4x);
WARNING 2024-10-18T05:01:09Z Node didn't have enough resource: cpu, requested: 
1000, used: 1776, capacity: 1930
INFO 2024-10-18T05:01:09.477265412Z Successfully assigned 
data-operations-production/dataops0000000002wcv-n0-0-driver to node 
gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8
INFO 2024-10-18T05:01:09.477265412Z Pod 
data-operations-production/dataops0000000002wcv-n0-0-driver is successfully 
bound to node gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8
INFO 2024-10-18T05:01:09.477265412Z Task 
data-operations-production/dataops0000000002wcv-n0-0-driver is completed
Let me know if anything else could be useful, I can make this happen again if 
needed.
 
[^orphaned_dataops_1.6_patched.json]


was (Author: JIRAUSER307370):
"4. Do you also see orphan allocations for 1.6.0? This Jira want to handle this 
according to the slack channel posting."
 
We do see orphan allocations (but no longer the negative resource check) even 
after applying  YUNIKORN-2910.
 
When we create a lot of pods at once we sometimes get OutOfCpu errors where it 
looks like the pods are being scheduled on machines where they don't fit. We 
also end up with a bunch of pending pods because the scheduler thinks empty 
machines have something on them, I think these are related because the OutOfCpu 
pods show up as orphan allocations on nodes (the wrong nodes) that should be 
empty. Below is a dump where we have ~50 pending pods for hours (they only 
schedule again when we restart yunikorn). The cluster-autoscaler won't do 
anything for us because there are plenty of machines that they'll fit on 
already.
 
For example in the attached dump this node should be empty except for 
daemonsets: gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-xtqc

It claims to have "dataops0000000002wcv-n0-0-driver" allocated on it, which 
isn't true, that allocation can be found in the logs under "Orphan allocation 
on node check" on the status page

That pod was scheduled on that node. It was assigned to 
gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8 (and got an OutOfCpu error)
Logs for dataops0000000002wcv-n0-0-driver:
INFO 2024-10-18T04:59:29.227546628Z 
data-operations-production/dataops0000000002wcv-n0-0-driver is queued and 
waiting for allocation
WARNING 2024-10-18T04:59:29.227546628Z Pod has inconsistent application 
metadata and may be rejected in a future YuniKorn release: label applicationId: 
"dataops0000000002wcv" doesn't match label spark-app-selector: 
"spark-da5f3013ea454b9c9d2056704dd34591"
INFO 2024-10-18T04:59:29.227546628Z Unschedulable request 
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s) 
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T04:59:29.227546628Z Task 
data-operations-production/dataops0000000002wcv-n0-0-driver is pending for the 
requested resources become available
INFO 2024-10-18T04:59:44.650502333Z Unschedulable request 
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s) 
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T04:59:56.134091Z Unschedulable request 
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s) 
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T05:00:09Z pod triggered scale-up: 
[\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-b/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-grp
 4->16 (max: 1000)} 
\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-a/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-52438d9c-grp
 4->15 (max: 1000)} 
\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-c/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-149830b3-grp
 4->15 (max: 1000)}]
INFO 2024-10-18T05:00:58.162394Z Unschedulable request 
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s) 
didn't match Pod's node affinity/selector (4x);
WARNING 2024-10-18T05:01:09Z Node didn't have enough resource: cpu, requested: 
1000, used: 1776, capacity: 1930
INFO 2024-10-18T05:01:09.477265412Z Successfully assigned 
data-operations-production/dataops0000000002wcv-n0-0-driver to node 
gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8
INFO 2024-10-18T05:01:09.477265412Z Pod 
data-operations-production/dataops0000000002wcv-n0-0-driver is successfully 
bound to node gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8
INFO 2024-10-18T05:01:09.477265412Z Task 
data-operations-production/dataops0000000002wcv-n0-0-driver is completed
Let me know if anything else could be useful, I can make this happen again if 
needed.
 
[^orphaned_dataops_1.6_patched.json]

> Don't add duplicated allocation to node when the allocation ask fails
> ---------------------------------------------------------------------
>
>                 Key: YUNIKORN-2895
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2895
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler
>            Reporter: Qi Zhu
>            Assignee: Qi Zhu
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: orphaned_dataops_1.6_patched.json
>
>
> When i try to revisit the new update allocation logic, the potential 
> duplicated allocation to node will happen if the allocation already 
> allocated.  And we try to add the allocation to the node again and don't 
> revert it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (YUNIKORN-2895) Don't add duplicated allocation to node when the allocation ask fails

Reply via email to