[ 
https://issues.apache.org/jira/browse/YUNIKORN-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064892#comment-17064892
 ] 

Wangda Tan commented on YUNIKORN-42:
------------------------------------

Thanks [~Tao Yang], you're absolutely right. 

There're two requirements in the doc: 
{code}
- Events and diagnostics can be retrieved from YuniKorn UI / REST API. (P2) 
- Be able to filter events/diagnostics based on queue/app/node/pod. (P2)
{code}

I mark them to P2 for now because retrieving from YuniKorn web UI needs a lot 
of front-end work to make it complete and make sure security access of the UI 
can be enforced. 

Instead, now many early users of YuniKorn are familiar with how to use existing 
K8s tools like POD event to understand what's going on in the cluster. So I 
think we can publish PODs events first, to satisfy 70-80% of troubleshooting 
needs w/o require admins to learn anything specific to YuniKorn.

We should definitely consider some more powerful and integrated tools for admin 
to use, similar to YARN-9050 after we finish the basic work.

> Better to support POD events for YuniKorn to troubleshoot allocation failures
> -----------------------------------------------------------------------------
>
>                 Key: YUNIKORN-42
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-42
>             Project: Apache YuniKorn
>          Issue Type: Task
>            Reporter: Wangda Tan
>            Priority: Major
>
> Now it is tricky to do troubleshoot for pod allocation, we need better expose 
> this information to POD description.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to