[ 
https://issues.apache.org/jira/browse/YUNIKORN-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805835#comment-17805835
 ] 

Rainie Li commented on YUNIKORN-2322:
-------------------------------------

Thanks [~pbacsko] for providing info.

We are not seeing "queue update failed unexpectedly" messages.

In our case, we see YuniKorn service latency kept high for a while (pls refer 
to latest screenshot) and it cannot schedule any apps even cluster has 
resource, we suspect service is hang for some reason.

 

 

> Investigate YuniKorn stuck when scheduling latency is high
> ----------------------------------------------------------
>
>                 Key: YUNIKORN-2322
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2322
>             Project: Apache YuniKorn
>          Issue Type: Task
>          Components: core - common
>            Reporter: Rainie Li
>            Assignee: Rainie Li
>            Priority: Major
>         Attachments: Screenshot 2024-01-10 at 4.31.52 PM.png, Screenshot 
> 2024-01-10 at 4.33.40 PM.png, Screenshot 2024-01-11 at 3.40.48 PM-1.png, 
> Screenshot 2024-01-11 at 3.40.48 PM.png
>
>
> We are seeing service stuck when latency increases, even cluster has 
> resource, YuniKorn will not be able to schedule apps. We have to manually 
> restart YuniKorn.
> we did profiling to find out most time are used by *tryReservedAllocate.* 
> Attached ** profiling screenshot and service latency data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to