[ https://issues.apache.org/jira/browse/YUNIKORN-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094145#comment-17094145 ]
Weiwei Yang commented on YUNIKORN-42: ------------------------------------- Hi [~adam.antal] Thanks for working on this!! I went through previous discussions and the design doc, I think there are some key areas we need to sort out. 1) expose via the rest server or flush to k8s event system The design addresses both, that makes sense. But which one is more important, and which one will come first? My opinion is the k8s event system. Because this is how users consume it. The key purpose of this Jira is not making our life easier, we need to make users' life easier. That said, by describing pods/nodes, they can understand e.g 90% of the reasons why a pod is not allocated. 2) the cache The key problem is the cache, how we build an efficient cache. The scheduler can push events (or records) to this cache, and this cache can be queried (via rest) or periodically flushed (to k8s event system). 3) aggregate records When the scheduler pushes events/records to the cache, dup records should be aggregated. Therefore it is important to design the schema of each record, so we can properly aggregate them. An example is, when we try to assign a pod, it may fail again and again in the scheduler loop, in such case, we would say "pod is unable to be allocated due to xxx reason, N times in past X seconds". > Better to support POD events for YuniKorn to troubleshoot allocation failures > ----------------------------------------------------------------------------- > > Key: YUNIKORN-42 > URL: https://issues.apache.org/jira/browse/YUNIKORN-42 > Project: Apache YuniKorn > Issue Type: Task > Reporter: Wangda Tan > Assignee: Adam Antal > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now it is tricky to do troubleshoot for pod allocation, we need better expose > this information to POD description. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org