[
https://issues.apache.org/jira/browse/YUNIKORN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926635#comment-17926635
]
Wilfred Spiegelenburg commented on YUNIKORN-3026:
-------------------------------------------------
That would work if it was just the scheduler involved. However on K8s it is not
just the scheduler.
When a pod gets scheduled the scheduler looks at the size of the pod and looks
for a node that has enough resources available for the pod to run. The resource
requests set on the pod are the only ones looked at. When a match is found the
node name is set in the pod. This triggers the next step and the kubelet takes
over at that point. The kubelet checks the pod in an admission phase to see it
fits. Again the resource requests are used for that.
The first option would be for the scheduler to change the requested pod
resources. That would make sure the admission phase on the kubelet still
passes. As per the K8s API [updating
resources|https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources]
is not possible. Even if it would be possible the complexity involved is
enormous. Containers in the pod request the resources. How would the scheduler
decide which container to change, what about minimal values etc. Too many
possible scenarios.
Then what is left would require support for overcommit of resources in the
scheduler as well as the same overcommit in the kubelet admission phase. Which
means the only option left would be to also change the kubelet. Changing the
kubelet is completely out of scope of this project, that would require a custom
K8s distribution.
To wrap it all up: this is not possible
> Resource overcommit
> -------------------
>
> Key: YUNIKORN-3026
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3026
> Project: Apache YuniKorn
> Issue Type: New Feature
> Components: core - scheduler
> Reporter: RafaĆ Boniecki
> Priority: Major
>
> Provide implementation of resource (cpu, memory) overcommit.
> Kubernetes requests/limits model promotes waste of a lot of resources. It's
> impossible to correctly (without wasting resources or resource starvation)
> set static resources at pod creation. It would be useful for scheduler to
> have an ability to ignore these requests/limits and schedule by real load
> instead up to some configured soft limit and when hard limit is hit (eg 60%
> of real cpu usage and/or 40% of real memory usage) it should have ability to
> deschedule some pods (configurable which ones eg by presence of annotation)
> using some kind of algorithm (eg random/highest resource used/lowest resource
> used, ideally in combination with priority of the pod and or it's
> labels/annotations).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]