[jira] [Resolved] (YUNIKORN-1105) Rethink memory resource conversion to MB

Wilfred Spiegelenburg (Jira) Thu, 17 Mar 2022 00:02:04 -0700


     [ 
https://issues.apache.org/jira/browse/YUNIKORN-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wilfred Spiegelenburg resolved YUNIKORN-1105.
---------------------------------------------
    Fix Version/s: 1.0.0
       Resolution: Fixed

both shim and web UI changes are in

> Rethink memory resource conversion to MB
> ----------------------------------------
>
>                 Key: YUNIKORN-1105
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1105
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: shim - kubernetes
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Craig Condit
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.0.0
>
>
> The choice to represent memory in units of MB and not in bytes comes with a 
> side effect. We convert a pod or node memory size into something that is a MB 
> (10 based).
> Not everything is expressible in whole MB. We round up to the nearest MB. 1 
> byte over and we use the whole MB. This means that a node looks larger in 
> YuniKorn than it really is. It is less than a MB but it can still cause an 
> issue with a pod just fitting or not fitting.
> As an example: a pods needs exactly 10MB, 10,000,000 bytes. A node in 
> YuniKorn shows 10MB free but in reality it is not 10,000,000 bytes but only 
> 9,000,001. 
> It can also happen the other way around. The pod asks for 9,000,001 bytes, 
> YuniKorn sees it as 10MB. The node in YuniKorn shows 9MB free but in reality 
> the node has 9,500,000 free as a previous pod we have scheduled did not use 
> 10MB but only 9,500,000. YuniKorn fails to place the pod, The auto scaler 
> says there is enough room to place the pod.
> I know I am splitting hairs here but it is a real possibility. These failures 
> are really hard to track down and link back. YuniKorn schedules the pod and 
> the node bind fails with not enough resources or scale up fails to trigger 
> when expected.
> With the choice of milli for cpu we have far less of an issue as K8s does not 
> support more than 3 decimal places. In other words the smallest value used in 
> K8s is {{1m}} for cpu.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (YUNIKORN-1105) Rethink memory resource conversion to MB

Reply via email to