[jira] [Commented] (YUNIKORN-1941) Limit Yunikorn to only use certain nodes to schedule workloads

Craig Condit (Jira) Wed, 30 Aug 2023 08:55:13 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760501#comment-17760501
 ]


Craig Condit commented on YUNIKORN-1941:
----------------------------------------

{quote}what would be the best way to accomplish this behavior?
{quote}
[~swisscom-ms] that's a very good question.

This is not the first time a feature such as this one has been proposed. In 
fact, this is almost a special case of YUNIKORN-22 (multiple partition 
support). However, as far as I know, no one has come up with a clean solution 
to the autoscaler problem. The autoscaler only considers pods in an 
Unschedulable state, but then uses its own logic (which is essentially a 
duplicate of the scheduler logic) to determine if the pod can actually fit on 
an existing node. If it can, no scale-up is performed. So in the case of nodes 
which are ignored by YuniKorn, you run a very high risk of the autoscaler just 
breaking completely and refusing to scale up more nodes as capacity is needed, 
since from the perspective of the autoscaler, the node you have ignored is 
perfectly acceptable for scheduling. As far as I can see, the only way to make 
sure the autoscaler works as expected is to supply appropriate taints and 
tolerations to your pods to ensure the autoscaler does not consider the other 
nodes as placement candidates.

Given the prevalence of public cloud environments which use autoscaling, 
building a feature into YuniKorn which basically breaks autoscaling 
environments isn't really a good thing for other users. However, I'd like to 
learn more about your use case. You mentioned reporting; that's definitely an 
area where improvements can be made. There is a large effort currently underway 
to add better event reporting and even expose the event stream via an API. 
Discussions are ongoing about adding usage information to those events as well, 
which would allow you to get per-node statistics over time, and possibly even 
filter out nodes with a type that is undesired.

If you already have your own admission webhook, I would recommend adding any 
tolerations (or alternatively node constraints) there. You can use whatever 
rules make sense for your environment, and you add functionality that works 
across schedulers (not just YuniKorn), so the autoscaler will also work 
correctly. It's pretty easy to add a nodeSelector which targets a particular 
label and ensure that all your pods require that label to exist. This will be 
fully honored by YuniKorn and your main goal of preventing scheduling on 
particular nodes will be achieved.

> Limit Yunikorn to only use certain nodes to schedule workloads
> --------------------------------------------------------------
>
>                 Key: YUNIKORN-1941
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1941
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: shim - kubernetes
>            Reporter: Marc Singer
>            Assignee: Marc Singer
>            Priority: Major
>
> We want to limit Yunikorn to utilize a specific part of the Kubernetes 
> cluster for it's workloads. These nodes should have a label or annotation 
> that is configurable in the Yunikorn configuration and if present should 
> limit workloads to only be scheduled on these specific nodes.
> According to the slack #dev channel, this should be accomplishable by 
> limiting the nodes returned from the kubernetes-shim.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YUNIKORN-1941) Limit Yunikorn to only use certain nodes to schedule workloads

Reply via email to