Yu-Lin Chen created YUNIKORN-2391:
-------------------------------------
Summary: "Test_With_Spark_Jobs" E2E test failed due to driver pod
stuck in Running after job completed
Key: YUNIKORN-2391
URL: https://issues.apache.org/jira/browse/YUNIKORN-2391
Project: Apache YuniKorn
Issue Type: Sub-task
Components: test - e2e
Reporter: Yu-Lin Chen
Assignee: Yu-Lin Chen
Attachments: 7_e2e-tests (v1.26.6, --plugin).txt,
Test_With_Spark_Jobs_k8sClusterInfo.txt,
Test_With_Spark_Jobs_ykContainerLog.txt,
Test_With_Spark_Jobs_ykFullStateDump.json
The "Test_With_Spark_Jobs" E2E test failed with the following details:
-
[https://github.com/apache/yunikorn-k8shim/actions/runs/7782705434/job/21229866675]
Three spark driver pods were created but one driver was not completed.
After checking driver pod logs in "Test_With_Spark_Jobs_k8sClusterInfo.txt",
the three Spark Pi jobs succesfully printed the Pi's value. However, one driver
didn't receive a "Shutdown hook" after SparkContext stopped.
It is not a problem with YuniKorn; it appears to be a potential issue with
Spark on Kubernetes (Could find similar issue here: Link)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]