[
https://issues.apache.org/jira/browse/YUNIKORN-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wilfred Spiegelenburg resolved YUNIKORN-1085.
---------------------------------------------
Fix Version/s: 1.1.0
Resolution: Fixed
daemon sets can ow preempt other workloads
> DaemonSet pods may fail to be scheduled on new nodes added during autoscaling
> -----------------------------------------------------------------------------
>
> Key: YUNIKORN-1085
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1085
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: shim - kubernetes
> Affects Versions: 0.12.2
> Environment: Amazon EKS, K8s 1.20, Cluster Autoscaler
> Reporter: Chaoran Yu
> Assignee: Manikandan R
> Priority: Blocker
> Fix For: 1.1.0
>
> Attachments: sampleNode.txt, samplePod.yaml
>
>
> After YUNIKORN-704 was done, YuniKorn should have the same mechanism as the
> default scheduler when it comes to scheduling DaemonSet pods. That's the case
> most times in our deployments. But recently we have found that DaemonSet
> scheduling became problematic again: When K8s Cluster Autoscaler adds new
> nodes in response to pending pods in the cluster, EKS will automatically
> create a CNI DaemonSet (Amazon's container networking module), one pod on
> each newly created node. But YuniKorn could not schedule these pods
> successfully. There's no informative error messages. The default queue that
> these pods belong to have available resources too. Because they couldn't be
> scheduled, EKS refuses to mark the new nodes as ready, they then get stuck in
> NotReady state. This issue is not always reproducible, but it has happened a
> few times. The root cause needs to be further researched.
> Note that when this bug happened, the mitigation that worked was to disable
> the YuniKorn admission controller, delete all the pending DaemonSet pods,
> wait for the default scheduler will schedule them all, then the new nodes
> will become Ready. So it seems that there are edge cases that haven't been
> covered by the previous work where YuniKorn handles DaemonSet differently
> compared to the default scheduler
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]