[jira] [Commented] (YUNIKORN-42) Better to support POD events for YuniKorn to troubleshoot allocation failures

Wilfred Spiegelenburg (Jira) Mon, 30 Mar 2020 05:20:11 -0700


    [ 
https://issues.apache.org/jira/browse/YUNIKORN-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070924#comment-17070924
 ]


Wilfred Spiegelenburg commented on YUNIKORN-42:
-----------------------------------------------

I had not  looked at this before but I agree with Tao's remark: we need this on 
the scheduler side. A pod is just a simple allocation in that view. Some of the 
data might make sense to flow back to the events on the K8s side but not all of 
them will.

The fact that the pod is not scheduled is a scheduler internal thing. Pushing 
the information back to K8s does not make it any clearer in most cases. How do 
you explain YuniKorn scheduling limits in K8s. I.e. The queue is out of 
resources, the user limit has been reached or even the maximum number of 
applications that can be run for a user is reached. That kind of information 
does not translate or help on the K8s side. We need to be able to show that on 
our web UI or via metrics that we publish. Saying that publishing YuniKorn 
internal events in a K8s way solves the problem of how to troubleshoot YuniKorn 
is over simplifying the issue at best and incorrect at worst.

Every event that we push back will be stored in etcd. I am worried about the 
amount of data we will push back to etcd if we are not careful.
Can we also guarantee that the YuniKorn admin can always describe all the pods 
and namespaces? That would require the admin to have high level access to the 
K8s cluster which he might not have. Something we need to keep in mind.

> Better to support POD events for YuniKorn to troubleshoot allocation failures
> -----------------------------------------------------------------------------
>
>                 Key: YUNIKORN-42
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-42
>             Project: Apache YuniKorn
>          Issue Type: Task
>            Reporter: Wangda Tan
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now it is tricky to do troubleshoot for pod allocation, we need better expose 
> this information to POD description.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

[jira] [Commented] (YUNIKORN-42) Better to support POD events for YuniKorn to troubleshoot allocation failures

Reply via email to