[ 
https://issues.apache.org/jira/browse/YUNIKORN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YUNIKORN-3076:
--------------------------------------------
        Fix Version/s:     (was: 1.6.0)
                           (was: 1.5.2)
                           (was: 1.6.1)
                           (was: 1.6.2)
                           (was: 1.6.3)
       Target Version: 1.7.0
    Affects Version/s: 1.6.2
                       1.6.1
                       1.6.0
                       1.5.2
                       1.6.3

Setting the target to 1.7.0. If the change is small and simple we could land it 
in that release, otherwise we push out to the next one.

> Web UI Fails to Load Applications for Certain Queues on Heavily Loaded 
> Clusters
> -------------------------------------------------------------------------------
>
>                 Key: YUNIKORN-3076
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3076
>             Project: Apache YuniKorn
>          Issue Type: Bug
>          Components: webapp
>    Affects Versions: 1.5.2, 1.6.0, 1.6.1, 1.6.2, 1.6.3
>            Reporter: Mit Desai
>            Assignee: Mit Desai
>            Priority: Major
>
> On heavily loaded clusters, the Web UI randomly fails to load applications 
> for certain queues. The allocations and resource usage are reported 
> correctly, but the list of applications does not load for some queues. There 
> is no definitive way to reproduce this scenario, but it has been observed 
> frequently.
> {*}Initial Assumptions{*}: Initially, it was assumed that this issue could be 
> due to a large payload being exchanged between the scheduler and the web UI, 
> causing network latency or client-side parsing delays for a large number of 
> applications/pods. However, this does not seem to be the case, as the issue 
> was observed yesterday on a queue with just 3 applications and approximately 
> 200 pods.
> {*}Root Cause{*}: Upon further debugging, it was found that not all 
> applications come back with a 'stateLog' object. When the UI rendering 
> occurs, there is an unconditional access to the stateLog object, which fails 
> for applications that do not have it. This causes the rendering process to 
> fail and results in a blank applications page.
> {*}Steps to Validate{*}:
>  # When experiencing such issues in the Web UI, open the inspect panel and 
> navigate to the network tab.
>  # Clear any existing network items. Note: Clear the network items if you are 
> moving to a different queue, as the UI will cache the applications object 
> unless the page is refreshed.
>  # Go to the applications tab and select the desired queue from the drop-down 
> menu.
>  # An 'Applications' tab should appear in the network tab, showing the 
> payload it received.
>  # If the UI is not loading the applications, there will be an application 
> with {{applicationState=New}} that does not have a stateLog object.
> {*}Proposed Solution{*}: Modify the UI rendering logic to handle cases where 
> the stateLog object is missing, ensuring that it does not fail and give up on 
> rendering the entire applications page. Implement error handling to either 
> skip or provide a default value for applications without a stateLog object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to