[ https://issues.apache.org/jira/browse/YUNIKORN-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wilfred Spiegelenburg resolved YUNIKORN-3076. --------------------------------------------- Fix Version/s: 1.7.0 1.8.0 Resolution: Fixed Change committed to master and cherry-picked into the branch-1.7 Thank you [~mitdesai] for the fix. > Web UI Fails to Load Applications for Certain Queues on Heavily Loaded > Clusters > ------------------------------------------------------------------------------- > > Key: YUNIKORN-3076 > URL: https://issues.apache.org/jira/browse/YUNIKORN-3076 > Project: Apache YuniKorn > Issue Type: Bug > Components: webapp > Affects Versions: 1.5.2, 1.6.0, 1.6.1, 1.6.2, 1.6.3 > Reporter: Mit Desai > Assignee: Mit Desai > Priority: Major > Labels: pull-request-available > Fix For: 1.7.0, 1.8.0 > > > On heavily loaded clusters, the Web UI randomly fails to load applications > for certain queues. The allocations and resource usage are reported > correctly, but the list of applications does not load for some queues. There > is no definitive way to reproduce this scenario, but it has been observed > frequently. > {*}Initial Assumptions{*}: Initially, it was assumed that this issue could be > due to a large payload being exchanged between the scheduler and the web UI, > causing network latency or client-side parsing delays for a large number of > applications/pods. However, this does not seem to be the case, as the issue > was observed yesterday on a queue with just 3 applications and approximately > 200 pods. > {*}Root Cause{*}: Upon further debugging, it was found that not all > applications come back with a 'stateLog' object. When the UI rendering > occurs, there is an unconditional access to the stateLog object, which fails > for applications that do not have it. This causes the rendering process to > fail and results in a blank applications page. > {*}Steps to Validate{*}: > # When experiencing such issues in the Web UI, open the inspect panel and > navigate to the network tab. > # Clear any existing network items. Note: Clear the network items if you are > moving to a different queue, as the UI will cache the applications object > unless the page is refreshed. > # Go to the applications tab and select the desired queue from the drop-down > menu. > # An 'Applications' tab should appear in the network tab, showing the > payload it received. > # If the UI is not loading the applications, there will be an application > with {{applicationState=New}} that does not have a stateLog object. > {*}Proposed Solution{*}: Modify the UI rendering logic to handle cases where > the stateLog object is missing, ensuring that it does not fail and give up on > rendering the entire applications page. Implement error handling to either > skip or provide a default value for applications without a stateLog object. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org For additional commands, e-mail: dev-h...@yunikorn.apache.org