Mit Desai created YUNIKORN-3089:
-----------------------------------

             Summary: Web UI shows stale "New" state applications that are no 
longer present in the cluster
                 Key: YUNIKORN-3089
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-3089
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: core - scheduler
            Reporter: Mit Desai


We are experiencing an issue where the YuniKorn Web UI continues to display 
applications in the *New* state, even though these applications are no longer 
present in the Kubernetes cluster. The list of such stale applications grows 
over time while the scheduler is running, and is cleared only upon a scheduler 
restart. In one instance, we observed this list growing to over 1200+ stale 
applications.

This issue is reproducible even with the *1.6.3 build* running with the 
*YUNIKORN-3084 patch* applied.

*Steps to Reproduce:*
 # Create pods that fail immediately due to constraints (e.g., Kyverno policy 
violations).
 # Observe in the Web UI that applications remain in the New state even after 
the pods are deleted from the cluster.
 # Over time, the list of applications in the New state keeps growing.
 # Restarting the scheduler resets the list, but the problem reappears as the 
scheduler continues to run.

*Obeservations:*
 * Applications remain in the *New* state in the Web UI, even after their 
corresponding pods are deleted from the cluster.
 * The problem appears to be related to the order and timing of create/delete 
events received by the core.
 * When a pod fails immediately (e.g., due to Kyverno policy violations), the 
shim receives both create and delete requests, but the core does not create the 
app in the partition context in time for the delete to be processed.
 * The core eventually receives the create request, but not the corresponding 
delete was received before that, resulting in the application remaining in the 
New state indefinitely.
 * The shim does not take any further action, leaving the application in this 
stale state until a scheduler restart.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to