[
https://issues.apache.org/jira/browse/YUNIKORN-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890992#comment-17890992
]
Olivier Sevin edited comment on YUNIKORN-2895 at 10/18/24 3:58 PM:
-------------------------------------------------------------------
"4. Do you also see orphan allocations for 1.6.0? This Jira want to handle this
according to the slack channel posting."
We do see orphan allocations (but no longer the negative resource check) even
after applying YUNIKORN-2910.
When we create a lot of pods at once we sometimes get OutOfCpu errors where it
looks like the pods are being scheduled on machines where they don't fit. We
also end up with a bunch of pending pods because the scheduler thinks empty
machines have something on them, I think these are related because the OutOfCpu
pods show up as orphan allocations on nodes (the wrong nodes) that should be
empty. Below is a dump where we have ~50 pending pods for hours (they only
schedule again when we restart yunikorn). The cluster-autoscaler won't do
anything for us because there are plenty of machines that they'll fit on
already.
For example in the attached dump this node should be empty except for
daemonsets: gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-xtqc
It claims to have "dataops0000000002wcv-n0-0-driver" allocated on it, which
isn't true, that allocation can be found in the logs under "Orphan allocation
on node check" on the status page
That pod was never scheduled on that node. It was assigned to
gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8 (and got an OutOfCpu error)
Logs for dataops0000000002wcv-n0-0-driver:
INFO 2024-10-18T04:59:29.227546628Z
data-operations-production/dataops0000000002wcv-n0-0-driver is queued and
waiting for allocation
WARNING 2024-10-18T04:59:29.227546628Z Pod has inconsistent application
metadata and may be rejected in a future YuniKorn release: label applicationId:
"dataops0000000002wcv" doesn't match label spark-app-selector:
"spark-da5f3013ea454b9c9d2056704dd34591"
INFO 2024-10-18T04:59:29.227546628Z Unschedulable request
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s)
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T04:59:29.227546628Z Task
data-operations-production/dataops0000000002wcv-n0-0-driver is pending for the
requested resources become available
INFO 2024-10-18T04:59:44.650502333Z Unschedulable request
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s)
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T04:59:56.134091Z Unschedulable request
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s)
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T05:00:09Z pod triggered scale-up:
[\\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-b/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-grp
4->16 (max: 1000)}
\\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-a/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-52438d9c-grp
4->15 (max: 1000)}
\\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-c/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-149830b3-grp
4->15 (max: 1000)}]
INFO 2024-10-18T05:00:58.162394Z Unschedulable request
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s)
didn't match Pod's node affinity/selector (4x);
WARNING 2024-10-18T05:01:09Z Node didn't have enough resource: cpu, requested:
1000, used: 1776, capacity: 1930
INFO 2024-10-18T05:01:09.477265412Z Successfully assigned
data-operations-production/dataops0000000002wcv-n0-0-driver to node
gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8
INFO 2024-10-18T05:01:09.477265412Z Pod
data-operations-production/dataops0000000002wcv-n0-0-driver is successfully
bound to node gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8
INFO 2024-10-18T05:01:09.477265412Z Task
data-operations-production/dataops0000000002wcv-n0-0-driver is completed
Let me know if anything else could be useful, I can make this happen again if
needed.
[^orphaned_dataops_1.6_patched.json]
was (Author: JIRAUSER307370):
"4. Do you also see orphan allocations for 1.6.0? This Jira want to handle this
according to the slack channel posting."
We do see orphan allocations (but no longer the negative resource check) even
after applying YUNIKORN-2910.
When we create a lot of pods at once we sometimes get OutOfCpu errors where it
looks like the pods are being scheduled on machines where they don't fit. We
also end up with a bunch of pending pods because the scheduler thinks empty
machines have something on them, I think these are related because the OutOfCpu
pods show up as orphan allocations on nodes (the wrong nodes) that should be
empty. Below is a dump where we have ~50 pending pods for hours (they only
schedule again when we restart yunikorn). The cluster-autoscaler won't do
anything for us because there are plenty of machines that they'll fit on
already.
For example in the attached dump this node should be empty except for
daemonsets: gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-xtqc
It claims to have "dataops0000000002wcv-n0-0-driver" allocated on it, which
isn't true, that allocation can be found in the logs under "Orphan allocation
on node check" on the status page
That pod was scheduled on that node. It was assigned to
gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8 (and got an OutOfCpu error)
Logs for dataops0000000002wcv-n0-0-driver:
INFO 2024-10-18T04:59:29.227546628Z
data-operations-production/dataops0000000002wcv-n0-0-driver is queued and
waiting for allocation
WARNING 2024-10-18T04:59:29.227546628Z Pod has inconsistent application
metadata and may be rejected in a future YuniKorn release: label applicationId:
"dataops0000000002wcv" doesn't match label spark-app-selector:
"spark-da5f3013ea454b9c9d2056704dd34591"
INFO 2024-10-18T04:59:29.227546628Z Unschedulable request
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s)
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T04:59:29.227546628Z Task
data-operations-production/dataops0000000002wcv-n0-0-driver is pending for the
requested resources become available
INFO 2024-10-18T04:59:44.650502333Z Unschedulable request
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s)
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T04:59:56.134091Z Unschedulable request
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s)
didn't match Pod's node affinity/selector (4x);
INFO 2024-10-18T05:00:09Z pod triggered scale-up:
[\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-b/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-grp
4->16 (max: 1000)}
\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-a/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-52438d9c-grp
4->15 (max: 1000)}
\{https://www.googleapis.com/compute/v1/projects/freenome-computational-dev/zones/us-west1-c/instanceGroups/gke-cr-west1-nap-n1-standard-2-6eig89-149830b3-grp
4->15 (max: 1000)}]
INFO 2024-10-18T05:00:58.162394Z Unschedulable request
'87f9d2dd-4933-449e-8159-e3a7bfb07501': failed plugin: 'NodeAffinity' node(s)
didn't match Pod's node affinity/selector (4x);
WARNING 2024-10-18T05:01:09Z Node didn't have enough resource: cpu, requested:
1000, used: 1776, capacity: 1930
INFO 2024-10-18T05:01:09.477265412Z Successfully assigned
data-operations-production/dataops0000000002wcv-n0-0-driver to node
gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8
INFO 2024-10-18T05:01:09.477265412Z Pod
data-operations-production/dataops0000000002wcv-n0-0-driver is successfully
bound to node gke-cr-west1-nap-n1-standard-2-6eig89-55fcd2da-k9b8
INFO 2024-10-18T05:01:09.477265412Z Task
data-operations-production/dataops0000000002wcv-n0-0-driver is completed
Let me know if anything else could be useful, I can make this happen again if
needed.
[^orphaned_dataops_1.6_patched.json]
> Don't add duplicated allocation to node when the allocation ask fails
> ---------------------------------------------------------------------
>
> Key: YUNIKORN-2895
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2895
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler
> Reporter: Qi Zhu
> Assignee: Qi Zhu
> Priority: Critical
> Labels: pull-request-available
> Attachments: orphaned_dataops_1.6_patched.json
>
>
> When i try to revisit the new update allocation logic, the potential
> duplicated allocation to node will happen if the allocation already
> allocated. And we try to add the allocation to the node again and don't
> revert it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]