[ 
https://issues.apache.org/jira/browse/YUNIKORN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg resolved YUNIKORN-3076.
---------------------------------------------
    Fix Version/s: 1.7.0
                   1.8.0
       Resolution: Fixed

Change committed to master and cherry-picked into the branch-1.7

Thank you [~mitdesai] for the fix.

> Web UI Fails to Load Applications for Certain Queues on Heavily Loaded 
> Clusters
> -------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-3076
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3076
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: webapp
>    Affects Versions: 1.5.2, 1.6.0, 1.6.1, 1.6.2, 1.6.3
>            Reporter: Mit Desai
>            Assignee: Mit Desai
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0, 1.8.0
>
>
> On heavily loaded clusters, the Web UI randomly fails to load applications 
> for certain queues. The allocations and resource usage are reported 
> correctly, but the list of applications does not load for some queues. There 
> is no definitive way to reproduce this scenario, but it has been observed 
> frequently.
> {*}Initial Assumptions{*}: Initially, it was assumed that this issue could be 
> due to a large payload being exchanged between the scheduler and the web UI, 
> causing network latency or client-side parsing delays for a large number of 
> applications/pods. However, this does not seem to be the case, as the issue 
> was observed yesterday on a queue with just 3 applications and approximately 
> 200 pods.
> {*}Root Cause{*}: Upon further debugging, it was found that not all 
> applications come back with a 'stateLog' object. When the UI rendering 
> occurs, there is an unconditional access to the stateLog object, which fails 
> for applications that do not have it. This causes the rendering process to 
> fail and results in a blank applications page.
> {*}Steps to Validate{*}:
>  # When experiencing such issues in the Web UI, open the inspect panel and 
> navigate to the network tab.
>  # Clear any existing network items. Note: Clear the network items if you are 
> moving to a different queue, as the UI will cache the applications object 
> unless the page is refreshed.
>  # Go to the applications tab and select the desired queue from the drop-down 
> menu.
>  # An 'Applications' tab should appear in the network tab, showing the 
> payload it received.
>  # If the UI is not loading the applications, there will be an application 
> with {{applicationState=New}} that does not have a stateLog object.
> {*}Proposed Solution{*}: Modify the UI rendering logic to handle cases where 
> the stateLog object is missing, ensuring that it does not fail and give up on 
> rendering the entire applications page. Implement error handling to either 
> skip or provide a default value for applications without a stateLog object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to