[
https://issues.apache.org/jira/browse/YUNIKORN-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070924#comment-17070924
]
Wilfred Spiegelenburg commented on YUNIKORN-42:
-----------------------------------------------
I had not looked at this before but I agree with Tao's remark: we need this on
the scheduler side. A pod is just a simple allocation in that view. Some of the
data might make sense to flow back to the events on the K8s side but not all of
them will.
The fact that the pod is not scheduled is a scheduler internal thing. Pushing
the information back to K8s does not make it any clearer in most cases. How do
you explain YuniKorn scheduling limits in K8s. I.e. The queue is out of
resources, the user limit has been reached or even the maximum number of
applications that can be run for a user is reached. That kind of information
does not translate or help on the K8s side. We need to be able to show that on
our web UI or via metrics that we publish. Saying that publishing YuniKorn
internal events in a K8s way solves the problem of how to troubleshoot YuniKorn
is over simplifying the issue at best and incorrect at worst.
Every event that we push back will be stored in etcd. I am worried about the
amount of data we will push back to etcd if we are not careful.
Can we also guarantee that the YuniKorn admin can always describe all the pods
and namespaces? That would require the admin to have high level access to the
K8s cluster which he might not have. Something we need to keep in mind.
> Better to support POD events for YuniKorn to troubleshoot allocation failures
> -----------------------------------------------------------------------------
>
> Key: YUNIKORN-42
> URL: https://issues.apache.org/jira/browse/YUNIKORN-42
> Project: Apache YuniKorn
> Issue Type: Task
> Reporter: Wangda Tan
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Now it is tricky to do troubleshoot for pod allocation, we need better expose
> this information to POD description.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]