Yu-Lin Chen created YUNIKORN-2067:
-------------------------------------
Summary: Test_With_Spark_Jobs e2e test wait for app state Running
after Spark job completed
Key: YUNIKORN-2067
URL: https://issues.apache.org/jira/browse/YUNIKORN-2067
Project: Apache YuniKorn
Issue Type: Bug
Components: test - e2e
Reporter: Yu-Lin Chen
Assignee: Yu-Lin Chen
The e2e test 'Test_With_Spark_Jobs' waits in a row for the 3 Spark applications
to reach the 'Running' state, which is incorrect. We can’t ensure the jobs are
still in running by the time we perform the check.
We should check spark driver pod state through KubeCtl Client instead of
YuniKorn’s RestClient because the application will be removed from the core
after it has completed.
Link of code:
[test/e2e/spark_jobs_scheduling/spark_jobs_scheduling_test.go#L147-L149|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/spark_jobs_scheduling/spark_jobs_scheduling_test.go#L147-L149]
Failed e2e test link:
[https://github.com/apache/yunikorn-k8shim/actions/runs/6596046649/job/17926552721#step:5:2098]
Failed e2e test log analysis:
* 17:18:09Z Pod for app spark-e27dd9a2140844828fdfb3d80e9fa1b4 created
* 17:18:11.725869Z (PodEvent in Log) PodEvent ‘Scheduling’ received
* 17:18:11.727811Z (PodEvent in Log) PodEvent ‘Scheduled’ received
* 17:18:11.735646Z (PodEvent in Log) PodEvent ‘PodBindSuccessful’ received
* {color:#4c9aff}17:20:10.965501Z (PodEvent in Log) PodEvent ‘TaskCompleted’
received{color}
{color:#de350b}(Complete before check.){color}
* 17:20:20.159 (Ginkgo) Waiting for application
spark-e27dd9a2140844828fdfb3d80e9fa1b4 to Running
* 17:26:25.9749 (Ginkgo) timeout
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]