Yu-Lin Chen created YUNIKORN-2292:
-------------------------------------
Summary: Flaky E2E Test: Orphan pods still exists after
TearDownNamespace()
Key: YUNIKORN-2292
URL: https://issues.apache.org/jira/browse/YUNIKORN-2292
Project: Apache YuniKorn
Issue Type: Bug
Components: test - e2e
Reporter: Yu-Lin Chen
Assignee: Yu-Lin Chen
The current [TearDownNamespace()
|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/framework/helpers/k8s/k8s_utils.go#L393]function
delete pods before deleting namesapce. However, there might be a Job still
alive that can recreate the deleted Pods. The incompleted cleanup will cause
other e2e test to fail.
For example: This failed E2E test ([PR
#742|https://github.com/apache/yunikorn-k8shim/actions/runs/7078108545/job/19271866521?pr=742#step:6:3952])
was caused by an orphan pod, which was created by a Job submitted in
[recovery_and_restart_test.go#L339|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/recovery_and_restart/recovery_and_restart_test.go#L339]
**
**
{*}Time series{*}:
* 2023-12-04T04:39:38.4083946Z Submit gang job. (recovery_and_restart)
* 2023-12-04T04:39:46.8767068Z {color:#de350b}Tear down namespace{color}:
devjyy8x (Delete pod → Delete Namesapce)
* 2023-12-04T04:39:47.0512900Z Tear down namespace completed (One orphan
remains)
* 2023-12-04T04:39:50.5110355Z Restart the scheduler pod. (resource_fairness)
* 2023-12-04T04:39:52.178Z ERROR Failed to add application to partition
(placement rejected)
→ {color:#de350b}The orphan pod led to node removal in scheduler after
recovery. {color}
* 2023-12-04T04:41:35.9084137Z test faield. (simple_preemptor e2e )
* 2023-12-04T04:41:36.3406477Z Cluster dump output shows 1 orphan pod with age
110s ({color:#de350b}The pod was recreated when running
TearDownNamespace(){color} )
{*}Solution{*}:
In addition to calling deletePod, we should also check and delete Jobs in
[TearDownNamespace()
|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/framework/helpers/k8s/k8s_utils.go#L393]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]