[
https://issues.apache.org/jira/browse/YUNIKORN-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091556#comment-17091556
]
Adam Antal commented on YUNIKORN-42:
------------------------------------
[~wwei] thanks for assign this to me. I know that you are all kinda busy with
the release, so I wrote down a few thoughts and you can respond to them when
you have time.
I recently got through the design doc and browsed the Scheduler Interface to
have more insight on the purpose of this jira. I think [~wangda]'s original
implementation plan is a good one, I would make a few suggestions according to
my and [~wilfreds] previous comments.
I would like to approach the pod, application, queue and node cases separately.
For *pods* it's logical to pass these events from the scheduler to the shim,
and the shim can further emit these to the k8s event system. So end-users will
{{kubectl describe}} the pending pod to see any errors that the scheduler can
emit. I'd like to change the way how it does so.
Because new pods are requested through {{AllocationAsk}} in {{UpdateRequest}},
so the proposed {{DiagnosticInformation}} in {{UpdateResponse}} is too broad
for this purpose. I'd put it into {{RejectedAllocationAsk}}, but as I can see
we already have a reason string that describes the rejection. Could we leverage
that perhaps?
Since *nodes* are also ResourceManager dependent objects, I'd so something
similar for emitting node-related events as well. As I searched the SI, I've
found {{AcceptedNode}} and {{RejectedNode}} objects - can we also use these for
the event system?
*Queues* are scheduler-level concepts so these should not be passed along with
the SI.
With regards to *applications*: I have the impression that applications are
RM-level concepts because they are included in the SI protocol. That being said
we also have to provide some diagnostics on that level, but there is no such
utility as {{kubectl describe application}} in k8s side - so the question is:
do we really need to do that?
One idea that I could think of is that we can also emit CRDs on behalf of the
shim that represents applications and that object can be the target of these
events. This is handled by the shim obviously, and can be synchronized with the
Spark / other applications' state (where we actually no need to communicate
this with the scheduler continuosly).
I see some advantage of these CRDs in contexts like work-preserving recovery (I
am not aware how this is currently handled in K8s), but would be pretty
straightforward to just read up the CRDs when an RM has to resync its state.
As for the event cache in the scheduler component, I think [~wangda]'s proposal
is good: we also need a way to approach the problem from the scheduler's
perspective. I'd definitely like to keep that piece of the architecture.
Please explain your opinion on that. I will create an updated POC document with
the things we discuss in this thread. I welcome your thoughts/constructive
criticism.
> Better to support POD events for YuniKorn to troubleshoot allocation failures
> -----------------------------------------------------------------------------
>
> Key: YUNIKORN-42
> URL: https://issues.apache.org/jira/browse/YUNIKORN-42
> Project: Apache YuniKorn
> Issue Type: Task
> Reporter: Wangda Tan
> Assignee: Adam Antal
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Now it is tricky to do troubleshoot for pod allocation, we need better expose
> this information to POD description.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]