[
https://issues.apache.org/jira/browse/YUNIKORN-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401485#comment-17401485
]
Weiwei Yang commented on YUNIKORN-704:
--------------------------------------
hi [~wilfreds] I just checked the cordon and uncordon node, you are correct
about the taint "node.kubernetes.io/unschedulable" is added automatically for
the cordoned nodes. what you are describing here is like what we want to do
with the additional placement constraint, that was part of the interface
design, but not implemented today. It's like adding a short circuit of
evaluating node selectors. Dropping what we have done so far
(shim/scheduler-interface changes) and move in that direction adds lots of work.
I just checked the changes [~Huang Ting Yao] has made, it is pretty
straightforward, just to ignore unschedulable node when the ask has that
certain attribute (when converts from a daemon set pod). that should be enough
for solving this issue. I think it is better to go with this, espically
[~chenya_zhang] is waiting on this. does that make sense?
> [Umbrella] Use the same mechanism to schedule daemon set pods as the default
> scheduler
> --------------------------------------------------------------------------------------
>
> Key: YUNIKORN-704
> URL: https://issues.apache.org/jira/browse/YUNIKORN-704
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: shim - kubernetes
> Reporter: Chaoran Yu
> Assignee: Ting Yao,Huang
> Priority: Blocker
> Fix For: 1.0.0
>
> Attachments: fluent-bit-describe.yaml, fluent-bit.yaml
>
>
> We sometimes see DaemonSet pods fail to be scheduled. Please see attached
> files for the YAML and _kubectl describe_ output of one such pod. We
> originally suspected [node
> reservation|https://github.com/apache/incubator-yunikorn-core/blob/v0.10.0/pkg/scheduler/context.go#L41]
> was to blame. But even after setting the DISABLE_RESERVATION environment
> variable to true, we still see such scheduling failures. The issue is
> especially severe when K8s nodes have disk pressure that causes lots of pods
> to be evicted. Newly created pods will stay in pending forever. We have to
> temporarily uninstall YuniKorn and let the default scheduler do the
> scheduling for these pods.
> This issue is critical because lots of important pods belong to a DaemonSet,
> such as Fluent Bit, a common logging solution. This is probably the last
> remaining roadblock for us to have the confidence to have YuniKorn entirely
> replace the default scheduler.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]