[
https://issues.apache.org/jira/browse/YUNIKORN-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17777131#comment-17777131
]
Wilfred Spiegelenburg commented on YUNIKORN-2020:
-------------------------------------------------
One thing that the default scheduler not does which YuniKorn does is use bin
packing. The default scheduler spreads it over the nodes. That might be why you
see 2 nodes with each 15 pods. Does the default scheduler fail to schedule 25
pods and only have one node? If that case fails in the same way it is not a
YuniKorn issue. If the last 2 pods in that case do not get scheduled on the
node: we missed something.
We push everything onto one node unless the node is full. The default scheduler
does not know bin packing, you need to reconfigure the scheduler to do that. We
check a number of resources in YuniKorn like memory and cpu, pods. If the node
does not expose the EBS volume limits on the node object we have no idea. We
also run all the predicate, they do not fail and we do not get told by K8s that
EBS is full via that side.
Can you run a check that shows the default scheduler does not exceed the limit
even when forced?
> Yunikorn aware of EBS volume attach limit per node
> --------------------------------------------------
>
> Key: YUNIKORN-2020
> URL: https://issues.apache.org/jira/browse/YUNIKORN-2020
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: shim - kubernetes
> Affects Versions: 1.3.0
> Environment: AWS EKS 1.24
> Reporter: Timothy Potter
> Priority: Major
>
> Yunikorn attempts to schedule pods with EBS PV on node beyond the EBS volume
> attach limit for that node.
> I have my EBS CSI driver attach limit set to 23, but when I schedule pods via
> Yunikorn, it schedules too many pods on the same node after the limit is
> reached.
> For example, I have a test STS that requests 30 pods with PVC. With Yunikorn,
> I get 25 scheduled (vs. the 23 max limit I configured, not sure if that's a
> bug in EBS CSI driver) and the last pod is stuck in {{ContainerCreating}}
> state.
> {code}
> Warning FailedAttachVolume 2m42s (x40 over 68m) attachdetach-controller
> AttachVolume.Attach failed for volume
> "pvc-be15c348-905d-4c7d-95d3-cffb2cb893ee" : rpc error: code = Internal desc
> = Could not attach volume "vol-0fb6f7656a825c6f0" to node "XYZ": attachment
> of disk "vol-0fb6f7656a825c6f0" failed, expected device to be attached but
> was attaching
> {code}
> If I try to schedule the same test STS using the default scheduler, it
> schedules all 30 pods but across 2 nodes as expected.
> How can we configure Yunikorn to recognize / respect this EBS vol attach
> limit?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]