[
https://issues.apache.org/jira/browse/YUNIKORN-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Craig Condit updated YUNIKORN-2866:
-----------------------------------
Target Version: 1.7.0 (was: 1.6.0, 1.7.0)
> [UMBRELLA] Support InPlacePodVerticalScaling (phase 2)
> ------------------------------------------------------
>
> Key: YUNIKORN-2866
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2866
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler, shim - kubernetes
> Reporter: Craig Condit
> Assignee: Craig Condit
> Priority: Major
>
> Kubernetes 1.27 added a new [InPlacePodVerticalScaling|http://example.com/]
> feature. While this is currently still in an alpha state as of 1.30 (and
> therefore requires a feature flag to enable), it will likely be moved to beta
> in 1.32, meaning it will be enabled by default, and considered stable in an
> upcoming release. The implementation of this feature has implications for
> YuniKorn, as with the feature enabled, the requests and limits of a Pod are
> no longer immutable.
> Fortunately, the updated API objects that enable the feature contain the new
> fields so we can add initial support for the feature now. To enable the
> feature for testing in a Kind cluster, the kind cluster configuration needs
> to contain the following:
> {noformat}
> kind: Cluster
> apiVersion: kind.x-k8s.io/v1alpha4
> featureGates:
> "InPlacePodVerticalScaling": true{noformat}
> During scheduling of new pods, the requested resources are still used as
> before.
> However, once a pod has been started, the actual resource utilization needs
> to be tracked via a new {{Pod.Status.ContainerStatuses[].AllocatedResources}}
> collection. In addition, if the value of {{Pod.Status.Resize}} is set to
> {{{}Proposed{}}}, the usage of each container needs to be computed as the
> maximum of its requested and allocated resources. The requested resources
> field becomes mutable once this feature is turned on, and it represents the
> latest *requested* (not actual) usage of the container.
> Supporting this feature is not optional within YuniKorn, as failure to
> process the updated resources will mean that we do not account for resource
> usage correctly if a pod is updated.
> Several steps will need to be taken to support this properly:
> * Ensure that GetPodResources() accurately computes the effective usage of
> the Pod in all cases. Since the AllocatedResources field will not be
> populated when this feature is not active, and is only set once the pod is in
> a running statue, this is fairly straightforward and can be implemented even
> in clusters which do not have this feature enabled.
> * The result of GetPodResources() will need to be cached in the shim so that
> we can detect resource changes on Pod updates. Comparing the result of
> GetPodResources() on the new Pod vs. the existing version will allow us to
> easily detect changes.
> * If changes are detected to a running YuniKorn-managed pod, an update
> message will need to be sent from the core to change the resources of the
> allocated task.
> * If changes are detected to a running non-Yunikorn-managed pod, and update
> of the node utilized resources will need to be sent from the shim to the core.
> * The core *must not* reject these updates, even if they would cause a queue
> to go over capacity. Instead, they must be applied to the appropriate ask or
> allocation unconditionally.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]