Yu-Lin Chen created YUNIKORN-2067:
-------------------------------------

             Summary: Test_With_Spark_Jobs e2e test wait for app state Running 
after Spark job completed
                 Key: YUNIKORN-2067
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2067
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: test - e2e
            Reporter: Yu-Lin Chen
            Assignee: Yu-Lin Chen


The e2e test 'Test_With_Spark_Jobs' waits in a row for the 3 Spark applications 
to reach the 'Running' state, which is incorrect. We can’t ensure the jobs are 
still in running by the time we perform the check.

We should check spark driver pod state through KubeCtl Client instead of 
YuniKorn’s RestClient because the application will be removed from the core 
after it has completed.

Link of code: 
[test/e2e/spark_jobs_scheduling/spark_jobs_scheduling_test.go#L147-L149|https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/spark_jobs_scheduling/spark_jobs_scheduling_test.go#L147-L149]
Failed e2e test link: 
[https://github.com/apache/yunikorn-k8shim/actions/runs/6596046649/job/17926552721#step:5:2098]

Failed e2e test log analysis:
 * 17:18:09Z Pod for app spark-e27dd9a2140844828fdfb3d80e9fa1b4 created
 * 17:18:11.725869Z (PodEvent in Log) PodEvent ‘Scheduling’ received
 * 17:18:11.727811Z (PodEvent in Log) PodEvent ‘Scheduled’ received
 * 17:18:11.735646Z (PodEvent in Log) PodEvent ‘PodBindSuccessful’ received
 * {color:#4c9aff}17:20:10.965501Z (PodEvent in Log) PodEvent ‘TaskCompleted’ 
received{color}
{color:#de350b}(Complete before check.){color}
 * 17:20:20.159 (Ginkgo) Waiting for application 
spark-e27dd9a2140844828fdfb3d80e9fa1b4 to Running
 * 17:26:25.9749 (Ginkgo) timeout

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to