[ https://issues.apache.org/jira/browse/YUNIKORN-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko reassigned YUNIKORN-3089: -------------------------------------- Assignee: Peter Bacsko > Web UI shows stale "New" state applications that are no longer present in the > cluster > ------------------------------------------------------------------------------------- > > Key: YUNIKORN-3089 > URL: https://issues.apache.org/jira/browse/YUNIKORN-3089 > Project: Apache YuniKorn > Issue Type: Bug > Components: core - scheduler > Reporter: Mit Desai > Assignee: Peter Bacsko > Priority: Major > Attachments: yunikorn-spark-cd26dba9a9d54b2089eafe73562efc4d.log > > > We are experiencing an issue where the YuniKorn Web UI continues to display > applications in the *New* state, even though these applications are no longer > present in the Kubernetes cluster. The list of such stale applications grows > over time while the scheduler is running, and is cleared only upon a > scheduler restart. In one instance, we observed this list growing to over > 1200+ stale applications. > This issue is reproducible even with the *1.6.3 build* running with the > *YUNIKORN-3084 patch* applied. > *Steps to Reproduce:* > # Create pods that fail immediately due to constraints (e.g., Kyverno policy > violations). > # Observe in the Web UI that applications remain in the New state even after > the pods are deleted from the cluster. > # Over time, the list of applications in the New state keeps growing. > # Restarting the scheduler resets the list, but the problem reappears as the > scheduler continues to run. > *Obeservations:* > * Applications remain in the *New* state in the Web UI, even after their > corresponding pods are deleted from the cluster. > * The problem appears to be related to the order and timing of create/delete > events received by the core. > * When a pod fails immediately (e.g., due to Kyverno policy violations), the > shim receives both create and delete requests, but the core does not create > the app in the partition context in time for the delete to be processed. > * The core eventually receives the create request, but not the corresponding > delete was received before that, resulting in the application remaining in > the New state indefinitely. > * The shim does not take any further action, leaving the application in this > stale state until a scheduler restart. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org