[jira] [Commented] (YUNIKORN-587) Allocated resources on a node could become negative

Chaoran Yu (Jira) Sat, 20 Mar 2021 11:33:04 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305545#comment-17305545
 ]


Chaoran Yu commented on YUNIKORN-587:
-------------------------------------

[~wwei] Good catch. I just checked. Indeed the memory requested by each of my 
placeholder pods was lower than the actual memory used by the real pods. In 
particular, when calculating the right amount of memory to reserve, I didn't 
properly take into account of the Spark driver and executor memory overhead. 
I'll make sure this is taken care of as part of 
[YUNIKORN-558|https://issues.apache.org/jira/browse/YUNIKORN-558]. But I wonder 
if YK can do something to guard against these kind of user errors. Maybe when 
swapping the placeholder with the real pods, when the resources don't match, 
abort the swapping and produce an error event? Or only allow swapping if 
placeholder reserved more resources than the real pods?

> Allocated resources on a node could become negative
> ---------------------------------------------------
>
>                 Key: YUNIKORN-587
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-587
>             Project: Apache YuniKorn
>          Issue Type: Sub-task
>          Components: shim - kubernetes
>    Affects Versions: 0.10
>            Reporter: Chaoran Yu
>            Priority: Critical
>         Attachments: Screen Shot 2021-03-19 at 9.43.40 PM.png
>
>
> There are cases when the K8s shim thinks that the allocated resources on a 
> node is negative, causing the amount of available resources to be bigger than 
> the actual node capacity. Need more investigation to understand what caused 
> this issue.
> Note that I didn't set up any queue resource quotas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-587) Allocated resources on a node could become negative

Reply via email to