[ 
https://issues.apache.org/jira/browse/YUNIKORN-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-741.
--------------------------------------------
    Fix Version/s: 0.11
       Resolution: Fixed

This issue does not occur in any released versions.
Only fixing in the same release as YUNIKORN-677 has been added.

> Regression: occupied resources miscalculated sometimes for yunikorn pods
> ------------------------------------------------------------------------
>
>                 Key: YUNIKORN-741
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-741
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: shim - kubernetes
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11
>
>
> This is a regression caused by YUNIKORN-677. 
> YUNIKORN-677 changes the check of how we see a pod needs recovery, now it is 
> based on whether a pod is allocated to a node (when pod.Spec.NodeName is 
> set). For occupied resources, it is similar, however, the fix in YUNIKORN-677 
> changes the condition for occupied resource recovery but leaves the node 
> coordinator code (where we handle pod updates) as the old way. This caused 
> the following issue:
>  * During recovery, the scheduler sees the scheduler pod was already 
> allocated (pod.Spec.NodeName is set), so the occupied resources were reported 
> to the core, code: 
> [https://github.com/apache/incubator-yunikorn-k8shim/blob/5658ce32f630d5ea75cea2772522a76ced30250a/pkg/cache/context_recovery.go#L113-L128].
>  * Once the scheduler is recovered, the pod informers will be started, and 
> the node coordinator starts to run. In some cases, the node informer will 
> inform us of the scheduler pod and the admission-controller pod phase changes 
> (from Pending to Running), and this triggers another occupied resource 
> update. Code: 
> [https://github.com/apache/incubator-yunikorn-k8shim/blob/5658ce32f630d5ea75cea2772522a76ced30250a/pkg/cache/node_coordinator.go#L74-L101]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to