[
https://issues.apache.org/jira/browse/YUNIKORN-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Craig Condit closed YUNIKORN-1347.
----------------------------------
Resolution: Implemented
> Yunikorn triggers EKS auto-scaling even pods requests have exceeded the queue
> limit
> ------------------------------------------------------------------------------------
>
> Key: YUNIKORN-1347
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1347
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: core - scheduler, shim - kubernetes
> Reporter: Anthony Wu
> Priority: Major
>
> Hi guys,
> We are trying to utilise Yunikorn to manage our AWS EKS infrastructure to
> limit resource usage for different users and groups. We also use k8s cluster
> auto-scaler
> ([https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler])
> for auto scaling of the cluster when necessary.
> *Environment*
> * AWS EKS on k8s 1.21
> * Yunikorn 1.1 running as k8s scheduler plugin to be most compatible
> * cluster-autoscaler V1.21.0
> {*}Issues{*}:
> Let's say we have quene has be below limit
> {code:yaml}
> queues:
> - name: dev
> submitacl: "*"
> resources:
> max:
> memory: 100Gi
> vcore: 10
> {code}
>
> Then we try to create 4 pods in the `dev` queue each requires 5 cores and
> 50Gi memory
> Then we are getting 2 pods {{Running}} and 2 pods {{{}Pending{}}}, because
> the queue has reached its limit of 10Gi memory and 10 cpus.
> We would expect the queued pods to not triggering EKS auto scaling, as they
> would not be able to be allocated until other resources have been release in
> the queue.
> But what we see is that, the Queued pods still trigger the cluster
> auto-scaling regardless. As shown in the example below:
> {code:java}
> Status: Pending
> ...
> Conditions:
> Type Status
> PodScheduled False
> Events:
> Type Reason Age From Message
> ---- ------ ---- ---- -------
> Warning FailedScheduling 3m5s yunikorn 0/147 nodes are
> available: 147 Pod is not ready for scheduling.
> Warning FailedScheduling 3m5s yunikorn 0/147 nodes are
> available: 147 Pod is not ready for scheduling.
> Normal Scheduling 3m3s yunikorn
> yunikorn/dask-user-07ff5f3b-8qjkl8 is queued and waiting for allocation
> Normal TriggeredScaleUp 2m53s cluster-autoscaler pod triggered
> scale-up:
> [{eksctl-cluster-nodegroup-spot-xlarge-compute-1-NodeGroup-8VURTD4WKCYV 0->4
> (max: 16)}]
> {code}
> So eventually, EKS auto-added some hosts but not actually been used and
> allocated as the pods are not approved to be scheduled yet.
> We also tried Gang scheduling with the pods in a task group, but it is also
> having similar issues: Even the whole gang is not ready to schedule, Yunikorn
> creates the place-holder pods which triggers auto-scaling of EKS cluster
> *Causes and potential solutions*
> We tried to look at both source code in the auto-scaler and Yunikorn, and we
> think the reason is just that the auto-scaler does not know about Yunikorn
> specific events and state (Pending but not QuotaApproved) of a Pod. It
> searches all the Pods with `PodScheduled=False` to then check whether it
> needs to add resources for them.
> The issue could be resolved from both side:
> - To solve from auto-scaler side, it needs to know the special events and
> state of Yunikorn
> - To solve from Yunikorn side, I think it needs to not create the pod or at
> least not in `Pending` phase until it is quota approved
> ** not sure how hard to achieve this, but as long as a pod is created and it
> goes to Pending then auto-scaler will try to pick it up
> We think solving it from Yunikron side would be cleaner, since auto-scaler
> should not need to know the k8s scheduler implementation in order to make a
> decision. Also there are other auto-scaler alternatives like AWS Karpenter
> could suffers the same issue when interact with Yunikorn.
> Wondering whether this issue report make sense to you guys. Let us know if
> there are any other solutions and whether it is possible to be solved in
> future :)
> Thanks a lot!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]