[ 
https://issues.apache.org/jira/browse/YUNIKORN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-2025.
---------------------------------------------
    Fix Version/s: 1.4.0
       Resolution: Fixed

Releases confirmation triggered by preempted containers are not passed back to 
the shim. This fixes the double counting of the releases.

thank you for the fix [~Yu-Lin Chen] 

> Mismatched running container count if preemption was triggered
> --------------------------------------------------------------
>
>                 Key: YUNIKORN-2025
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2025
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: core - scheduler, metrics
>            Reporter: Yu-Lin Chen
>            Assignee: Yu-Lin Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.4.0
>
>         Attachments: Nagative Running Container Count - UI Screenshot.png, 
> Release Allocation Flow (Preemption).png, 
> YUNIKORN-2025-Mismatched-running-container-count.patch
>
>
> Yunikorn UI shows negative running container count after e2e test 
> ‘Verify_basic_preemption’ completed. The e2e test will create below 4 pods:
>  * 3 pod in root.sandbox1 (1 pod will be preempted)
>  * 1 pod in root.sandbox2
> The final running container count will be -1, and the internal metrics are:
>  * totalContainersRunning(-1) := allocatedContainers(4) - 
> releasedContainers({color:#ff0000}5{color})
> → Mismatched: releasedContainers should be 4 instead of 5.
> {+}Reproduce Steps{+}:
> 1. Trigger the e2e test 'Verify_basic_preemption'
>  
> {code:java}
> cd yunikorn-k8shim/test/e2e/preemption
> ginkgo run -r -v --focus "Verify_basic_preemption" -- -yk-namespace 
> "yunikorn" -kube-config  "$HOME/.kube/config"{code}
> 2. Check Yunikorn UI, the running container will be -1.
>  
> {+}Root cause and proposed solution{+}:
> After trace the preemption flow, for the preempted pod, there have 2 parts 
> that will increase released container count: 
>  * Core → Shim (TerminationType: PREEMPTED_BY_SCHEDULER) 
>  * Core → Shim (TerminationType: STOPPED_BY_RM) (A callback of Shim → Core.)
> We can skip the increase in first path(Core → Shim, TerminationType: 
> PREEMPTED_BY_SCHEDULER). 
> (Please refer to "Release Allocation Flow (Preemption).png" and attached 
> patch file.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to