[
https://issues.apache.org/jira/browse/YUNIKORN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wilfred Spiegelenburg updated YUNIKORN-3076:
--------------------------------------------
Target Version: 1.7.0, 1.8.0 (was: 1.7.0)
> Web UI Fails to Load Applications for Certain Queues on Heavily Loaded
> Clusters
> -------------------------------------------------------------------------------
>
> Key: YUNIKORN-3076
> URL: https://issues.apache.org/jira/browse/YUNIKORN-3076
> Project: Apache YuniKorn
> Issue Type: Bug
> Components: webapp
> Affects Versions: 1.5.2, 1.6.0, 1.6.1, 1.6.2, 1.6.3
> Reporter: Mit Desai
> Assignee: Mit Desai
> Priority: Major
> Labels: pull-request-available
>
> On heavily loaded clusters, the Web UI randomly fails to load applications
> for certain queues. The allocations and resource usage are reported
> correctly, but the list of applications does not load for some queues. There
> is no definitive way to reproduce this scenario, but it has been observed
> frequently.
> {*}Initial Assumptions{*}: Initially, it was assumed that this issue could be
> due to a large payload being exchanged between the scheduler and the web UI,
> causing network latency or client-side parsing delays for a large number of
> applications/pods. However, this does not seem to be the case, as the issue
> was observed yesterday on a queue with just 3 applications and approximately
> 200 pods.
> {*}Root Cause{*}: Upon further debugging, it was found that not all
> applications come back with a 'stateLog' object. When the UI rendering
> occurs, there is an unconditional access to the stateLog object, which fails
> for applications that do not have it. This causes the rendering process to
> fail and results in a blank applications page.
> {*}Steps to Validate{*}:
> # When experiencing such issues in the Web UI, open the inspect panel and
> navigate to the network tab.
> # Clear any existing network items. Note: Clear the network items if you are
> moving to a different queue, as the UI will cache the applications object
> unless the page is refreshed.
> # Go to the applications tab and select the desired queue from the drop-down
> menu.
> # An 'Applications' tab should appear in the network tab, showing the
> payload it received.
> # If the UI is not loading the applications, there will be an application
> with {{applicationState=New}} that does not have a stateLog object.
> {*}Proposed Solution{*}: Modify the UI rendering logic to handle cases where
> the stateLog object is missing, ensuring that it does not fail and give up on
> rendering the entire applications page. Implement error handling to either
> skip or provide a default value for applications without a stateLog object.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]