[
https://issues.apache.org/jira/browse/YUNIKORN-2025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775526#comment-17775526
]
Wilfred Spiegelenburg commented on YUNIKORN-2025:
-------------------------------------------------
I think we have a slightly different issue:
{quote}{+}Root cause and proposed solution{+}:
After trace the preemption flow, for the preempted pod, there have 2 parts that
will increase released container count:
* Core → Shim (TerminationType: PREEMPTED_BY_SCHEDULER)
* Core → Shim (TerminationType: STOPPED_BY_RM) (A callback of Shim →
Core.){quote}
This does not seem right, like with the TIMEOUT processing we should return the
same termination type to the core.
Using the same termination type prevents these loops. A remove triggered by the
k8shim has STOPPED_BY_RM as the termination type. The k8shim filters those out
when received from the core. Processes them specially in the callback as the
pod should already have been removed.
The removals from the core have PREEMPTED_BY_SCHEDULER, TIMEOUT or
PLACEHOLDER_REPLACED. They trigger a pod removal in the k8shim. The pod removal
confirmation is echoed by the k8shim to the core. In that release send to the
core we should see the same termination type.
The core handles TIMEOUT and PLACEHOLDER_REPLACED specially in
{{PartitionContext.removeAllocation()}} The PREEMPTED_BY_SCHEDULER should do
something similar to the TIMEOUT and not be returned to the k8shim again.
> Mismatched running container count if preemption was triggered
> --------------------------------------------------------------
>
> Key: YUNIKORN-2025
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2025
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler, metrics
> Reporter: Yu-Lin Chen
> Assignee: Yu-Lin Chen
> Priority: Major
> Labels: pull-request-available
> Attachments: Nagative Running Container Count - UI Screenshot.png,
> Release Allocation Flow (Preemption).png,
> YUNIKORN-2025-Mismatched-running-container-count.patch
>
>
> Yunikorn UI shows negative running container count after e2e test
> ‘Verify_basic_preemption’ completed. The e2e test will create below 4 pods:
> * 3 pod in root.sandbox1 (1 pod will be preempted)
> * 1 pod in root.sandbox2
> The final running container count will be -1, and the internal metrics are:
> * totalContainersRunning(-1) := allocatedContainers(4) -
> releasedContainers({color:#ff0000}5{color})
> → Mismatched: releasedContainers should be 4 instead of 5.
> {+}Reproduce Steps{+}:
> 1. Trigger the e2e test 'Verify_basic_preemption'
>
> {code:java}
> cd yunikorn-k8shim/test/e2e/preemption
> ginkgo run -r -v --focus "Verify_basic_preemption" -- -yk-namespace
> "yunikorn" -kube-config "$HOME/.kube/config"{code}
> 2. Check Yunikorn UI, the running container will be -1.
>
> {+}Root cause and proposed solution{+}:
> After trace the preemption flow, for the preempted pod, there have 2 parts
> that will increase released container count:
> * Core → Shim (TerminationType: PREEMPTED_BY_SCHEDULER)
> * Core → Shim (TerminationType: STOPPED_BY_RM) (A callback of Shim → Core.)
> We can skip the increase in first path(Core → Shim, TerminationType:
> PREEMPTED_BY_SCHEDULER).
> (Please refer to "Release Allocation Flow (Preemption).png" and attached
> patch file.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]