Yu-Lin Chen created YUNIKORN-2391:
-------------------------------------

             Summary: "Test_With_Spark_Jobs" E2E test failed due to driver pod 
stuck in Running after job completed
                 Key: YUNIKORN-2391
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-2391
             Project: Apache YuniKorn
          Issue Type: Sub-task
          Components: test - e2e
            Reporter: Yu-Lin Chen
            Assignee: Yu-Lin Chen
         Attachments: 7_e2e-tests (v1.26.6, --plugin).txt, 
Test_With_Spark_Jobs_k8sClusterInfo.txt, 
Test_With_Spark_Jobs_ykContainerLog.txt, 
Test_With_Spark_Jobs_ykFullStateDump.json

The "Test_With_Spark_Jobs" E2E test failed with the following details:
 - 
[https://github.com/apache/yunikorn-k8shim/actions/runs/7782705434/job/21229866675]

Three spark driver pods were created but one driver was not completed.

After checking driver pod logs in "Test_With_Spark_Jobs_k8sClusterInfo.txt", 
the three Spark Pi jobs succesfully printed the Pi's value. However, one driver 
didn't receive a "Shutdown hook" after SparkContext stopped.  

It is not a problem with YuniKorn; it appears to be a potential issue with 
Spark on Kubernetes (Could find similar issue here: Link)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to