[ 
https://issues.apache.org/jira/browse/YUNIKORN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926635#comment-17926635
 ] 

Wilfred Spiegelenburg commented on YUNIKORN-3026:
-------------------------------------------------

That would work if it was just the scheduler involved. However on K8s it is not 
just the scheduler.

When a pod gets scheduled the scheduler looks at the size of the pod and looks 
for a node that has enough resources available for the pod to run. The resource 
requests set on the pod are the only ones looked at. When a match is found the 
node name is set in the pod. This triggers the next step and the kubelet takes 
over at that point. The kubelet checks the pod in an admission phase to see it 
fits. Again the resource requests are used for that.

The first option would be for the scheduler to change the requested pod 
resources. That would make sure the admission phase on the kubelet still 
passes. As per the K8s API [updating 
resources|https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources]
 is not possible. Even if it would be possible the complexity involved is 
enormous. Containers in the pod request the resources. How would the scheduler 
decide which container to change, what about minimal values etc. Too many 
possible scenarios.

Then what is left would require support for overcommit of resources in the 
scheduler as well as the same overcommit in the kubelet admission phase. Which 
means the only option left would be to also change the kubelet. Changing the 
kubelet is completely out of scope of this project, that would require a custom 
K8s distribution.

To wrap it all up: this is not possible

> Resource overcommit
> -------------------
>
>                 Key: YUNIKORN-3026
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3026
>             Project: Apache YuniKorn
>          Issue Type: New Feature
>          Components: core - scheduler
>            Reporter: RafaƂ Boniecki
>            Priority: Major
>
> Provide implementation of resource (cpu, memory) overcommit.
> Kubernetes requests/limits model promotes waste of a lot of resources. It's 
> impossible to correctly (without wasting resources or resource starvation) 
> set static resources at pod creation. It would be useful for scheduler to 
> have an ability to ignore these requests/limits and schedule by real load 
> instead up to some configured soft limit and when hard limit is hit (eg 60% 
> of real cpu usage and/or 40% of real memory usage) it should have ability to 
> deschedule some pods (configurable which ones eg by presence of annotation) 
> using some kind of algorithm (eg random/highest resource used/lowest resource 
> used, ideally in combination with priority of the pod and or it's 
> labels/annotations).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to