dannyeuu opened a new issue, #39524: URL: https://github.com/apache/airflow/issues/39524
### Apache Airflow Provider(s) apache-spark ### Versions of Apache Airflow Providers apache-airflow-providers-apache-hive==6.1.2 apache-airflow-providers-apache-spark==4.1.1 apache-airflow-providers-cncf-kubernetes==7.3.0 apache-airflow-providers-common-sql==1.6.0 ### Apache Airflow version 2.6.3 ### Operating System Debian GNU/Linux 11 (bullseye) ### Deployment Official Apache Airflow Helm Chart ### Deployment details Dockerfile FROM image apache/airflow:2.6.3-python3.10 airflow helm Chart ### What happened We run lots of jobs every day, but every day one or two tasks the Spark complete the job, but the task still keeps in running state with no new logs, until manually change the state t success. Task with wrong behavior, the logs keep like this: ``` [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:13 INFO LoggingPodStatusWatcherImpl: Application status for spark-7edea139decf4f90bd5d0fbac97f8869 (phase: Running) [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:13 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2024-05-09, 06:25:13 -03] {spark_submit.py:476} INFO - Identified spark driver pod: reverent-mcclintock-2cf0078f5caa813e-driver [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - pod name: reverent-mcclintock-2cf0078f5caa813e-driver [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - namespace: spark [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - labels: Client -> 3irmaos, Environment -> production, Name -> etl-ira-ciss---3irmaos, Product -> ira, Role -> process, Step -> refinement, airflow-attempt -> 1, airflow-map-index -> , airflow-spark-task-build-version -> spark-k8s-1.0.0, airflow_dag -> cluster_ira_ciss_daily_3irmaos, airflow_task -> load.load_client_product_3irmaos, airflow_task_uuid -> ff01d563-5ea8-555c-8444-413b1f29a5ee, spark-affinty-label -> c8cd4607d6b14f3a81bcce821d0054bb, spark-app-name -> reverent-mcclintock, spark-app-selector -> spark-7edea139decf4f90bd5d0fbac97f8869, spark-role -> driver, spark-version -> 3.4.2, spark_job_execution_date -> 2024-05-09, spark_job_name -> load.load_client_product_3irmaos-2024-05-09, spotinst.io/restrict-scale-down -> true [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - pod uid: 9ed68d10-c28e-4a4b-8136-e68b9be49b9f [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - creation time: 2024-05-09T09:23:51Z [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - service account name: spark [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - volumes: pod-template-volume, spark-local-dir-1, spark-conf-volume-driver, kube-api-access-69jld [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - node name: i-0097e957e30fe4dd5 [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - start time: 2024-05-09T09:23:51Z [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - phase: Running [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container status: [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container name: spark-kubernetes-driver [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container image: xxxx.com/xxxxxx/pyspark:v2_patch13 [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container state: terminated [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container started at: 2024-05-09T09:23:53Z [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container finished at: 2024-05-09T09:25:13Z [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - exit code: 0 [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - termination reason: Completed [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 INFO LoggingPodStatusWatcherImpl: Application status for spark-7edea139decf4f90bd5d0fbac97f8869 (phase: Running) [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 INFO LoggingPodStatusWatcherImpl: State changed, new state: [2024-05-09, 06:25:14 -03] {spark_submit.py:476} INFO - Identified spark driver pod: reverent-mcclintock-2cf0078f5caa813e-driver [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - pod name: reverent-mcclintock-2cf0078f5caa813e-driver [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - namespace: spark [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - labels: Client -> 3irmaos, Environment -> production, Name -> etl-ira-ciss---3irmaos, Product -> ira, Role -> process, Step -> refinement, airflow-attempt -> 1, airflow-map-index -> , airflow-spark-task-build-version -> spark-k8s-1.0.0, airflow_dag -> cluster_ira_ciss_daily_3irmaos, airflow_task -> load.load_client_product_3irmaos, airflow_task_uuid -> ff01d563-5ea8-555c-8444-413b1f29a5ee, spark-affinty-label -> c8cd4607d6b14f3a81bcce821d0054bb, spark-app-name -> reverent-mcclintock, spark-app-selector -> spark-7edea139decf4f90bd5d0fbac97f8869, spark-role -> driver, spark-version -> 3.4.2, spark_job_execution_date -> 2024-05-09, spark_job_name -> load.load_client_product_3irmaos-2024-05-09, spotinst.io/restrict-scale-down -> true [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - pod uid: 9ed68d10-c28e-4a4b-8136-e68b9be49b9f [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - creation time: 2024-05-09T09:23:51Z [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - service account name: spark [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - volumes: pod-template-volume, spark-local-dir-1, spark-conf-volume-driver, kube-api-access-69jld [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - node name: i-0097e957e30fe4dd5 [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - start time: 2024-05-09T09:23:51Z [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - phase: Succeeded [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container status: [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container name: spark-kubernetes-driver [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container image: xxxx.com/xxxxxx/pyspark:v2_patch13 [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container state: terminated [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container started at: 2024-05-09T09:23:53Z [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container finished at: 2024-05-09T09:25:13Z [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - exit code: 0 [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - termination reason: Completed [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 INFO LoggingPodStatusWatcherImpl: Application status for spark-7edea139decf4f90bd5d0fbac97f8869 (phase: Succeeded) [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 INFO LoggingPodStatusWatcherImpl: Container final statuses: [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container name: spark-kubernetes-driver [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container image: xxxx.com/xxxxxx/pyspark:v2_patch13 [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container state: terminated [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container started at: 2024-05-09T09:23:53Z [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container finished at: 2024-05-09T09:25:13Z [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - exit code: 0 [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - termination reason: Completed [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 INFO LoggingPodStatusWatcherImpl: Application reverent-mcclintock with submission ID spark:reverent-mcclintock-2cf0078f5caa813e-driver finished ``` ### What you think should happen instead Expected behavior: a normal task will have the `taskinstance.py` part marking the task as success : ``` [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - container state: terminated [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - container started at: 2024-05-09T09:24:04Z [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - container finished at: 2024-05-09T09:25:07Z [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - exit code: 0 [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - termination reason: Completed [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:08 INFO LoggingPodStatusWatcherImpl: Application elegant-rubin with submission ID spark:elegant-rubin-91cab58f5caaaab2-driver finished [2024-05-09, 06:25:09 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:09 INFO ShutdownHookManager: Shutdown hook called [2024-05-09, 06:25:09 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-175bae5e-77d3-45b1-aa3c-cc88bdd8a462 [2024-05-09, 06:25:09 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-70c6860c-1925-4858-be1f-23e9c6d5723a [2024-05-09, 06:25:09 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:09 INFO ShutdownHookManager: Deleting directory /tmp/spark-236d1121-2075-4b6e-bb54-2e3e66decd6a [2024-05-09, 06:25:09 -03] {taskinstance.py:1345} INFO - Marking task as SUCCESS. dag_id=cluster_ira_ciss_daily_3irmaos, task_id=load.load_competitor_store_3irmaos, execution_date=20240508T070000, start_date=20240509T092347, end_date=20240509T092509 [2024-05-09, 06:25:09 -03] {local_task_job_runner.py:225} INFO - Task exited with return code 0 [2024-05-09, 06:25:09 -03] {taskinstance.py:2653} INFO - 0 downstream tasks scheduled from follow-on schedule check ``` ### How to reproduce It's totally random, happens in different dags with different tasks ### Anything else _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org