dannyeuu opened a new issue, #39524:
URL: https://github.com/apache/airflow/issues/39524

   ### Apache Airflow Provider(s)
   
   apache-spark
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-apache-hive==6.1.2
   apache-airflow-providers-apache-spark==4.1.1
   apache-airflow-providers-cncf-kubernetes==7.3.0
   apache-airflow-providers-common-sql==1.6.0
   
   ### Apache Airflow version
   
   2.6.3
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Dockerfile FROM image apache/airflow:2.6.3-python3.10
   airflow helm Chart
   
   ### What happened
   
   We run lots of jobs every day, but every day one or two tasks the Spark 
complete the job, but the task still keeps in running state with no new logs, 
until manually change the state t success.
   
   Task with wrong behavior, the logs keep like this:
   ```
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:13 
INFO LoggingPodStatusWatcherImpl: Application status for 
spark-7edea139decf4f90bd5d0fbac97f8869 (phase: Running)
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:13 
INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2024-05-09, 06:25:13 -03] {spark_submit.py:476} INFO - Identified spark 
driver pod: reverent-mcclintock-2cf0078f5caa813e-driver
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - pod name: 
reverent-mcclintock-2cf0078f5caa813e-driver
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - namespace: spark
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - labels: Client -> 
3irmaos, Environment -> production, Name -> etl-ira-ciss---3irmaos, Product -> 
ira, Role -> process, Step -> refinement, airflow-attempt -> 1, 
airflow-map-index -> , airflow-spark-task-build-version -> spark-k8s-1.0.0, 
airflow_dag -> cluster_ira_ciss_daily_3irmaos, airflow_task -> 
load.load_client_product_3irmaos, airflow_task_uuid -> 
ff01d563-5ea8-555c-8444-413b1f29a5ee, spark-affinty-label -> 
c8cd4607d6b14f3a81bcce821d0054bb, spark-app-name -> reverent-mcclintock, 
spark-app-selector -> spark-7edea139decf4f90bd5d0fbac97f8869, spark-role -> 
driver, spark-version -> 3.4.2, spark_job_execution_date -> 2024-05-09, 
spark_job_name -> load.load_client_product_3irmaos-2024-05-09, 
spotinst.io/restrict-scale-down -> true
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - pod uid: 
9ed68d10-c28e-4a4b-8136-e68b9be49b9f
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - creation time: 
2024-05-09T09:23:51Z
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - service account 
name: spark
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - volumes: 
pod-template-volume, spark-local-dir-1, spark-conf-volume-driver, 
kube-api-access-69jld
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - node name: 
i-0097e957e30fe4dd5
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - start time: 
2024-05-09T09:23:51Z
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - phase: Running
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container status:
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container name: 
spark-kubernetes-driver
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container image: 
xxxx.com/xxxxxx/pyspark:v2_patch13
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container state: 
terminated
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container started 
at: 2024-05-09T09:23:53Z
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - container finished 
at: 2024-05-09T09:25:13Z
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - exit code: 0
   [2024-05-09, 06:25:13 -03] {spark_submit.py:492} INFO - termination reason: 
Completed
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 
INFO LoggingPodStatusWatcherImpl: Application status for 
spark-7edea139decf4f90bd5d0fbac97f8869 (phase: Running)
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 
INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2024-05-09, 06:25:14 -03] {spark_submit.py:476} INFO - Identified spark 
driver pod: reverent-mcclintock-2cf0078f5caa813e-driver
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - pod name: 
reverent-mcclintock-2cf0078f5caa813e-driver
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - namespace: spark
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - labels: Client -> 
3irmaos, Environment -> production, Name -> etl-ira-ciss---3irmaos, Product -> 
ira, Role -> process, Step -> refinement, airflow-attempt -> 1, 
airflow-map-index -> , airflow-spark-task-build-version -> spark-k8s-1.0.0, 
airflow_dag -> cluster_ira_ciss_daily_3irmaos, airflow_task -> 
load.load_client_product_3irmaos, airflow_task_uuid -> 
ff01d563-5ea8-555c-8444-413b1f29a5ee, spark-affinty-label -> 
c8cd4607d6b14f3a81bcce821d0054bb, spark-app-name -> reverent-mcclintock, 
spark-app-selector -> spark-7edea139decf4f90bd5d0fbac97f8869, spark-role -> 
driver, spark-version -> 3.4.2, spark_job_execution_date -> 2024-05-09, 
spark_job_name -> load.load_client_product_3irmaos-2024-05-09, 
spotinst.io/restrict-scale-down -> true
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - pod uid: 
9ed68d10-c28e-4a4b-8136-e68b9be49b9f
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - creation time: 
2024-05-09T09:23:51Z
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - service account 
name: spark
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - volumes: 
pod-template-volume, spark-local-dir-1, spark-conf-volume-driver, 
kube-api-access-69jld
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - node name: 
i-0097e957e30fe4dd5
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - start time: 
2024-05-09T09:23:51Z
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - phase: Succeeded
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container status:
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container name: 
spark-kubernetes-driver
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container image: 
xxxx.com/xxxxxx/pyspark:v2_patch13
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container state: 
terminated
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container started 
at: 2024-05-09T09:23:53Z
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container finished 
at: 2024-05-09T09:25:13Z
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - exit code: 0
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - termination reason: 
Completed
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 
INFO LoggingPodStatusWatcherImpl: Application status for 
spark-7edea139decf4f90bd5d0fbac97f8869 (phase: Succeeded)
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 
INFO LoggingPodStatusWatcherImpl: Container final statuses:
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container name: 
spark-kubernetes-driver
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container image: 
xxxx.com/xxxxxx/pyspark:v2_patch13
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container state: 
terminated
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container started 
at: 2024-05-09T09:23:53Z
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - container finished 
at: 2024-05-09T09:25:13Z
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - exit code: 0
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - termination reason: 
Completed
   [2024-05-09, 06:25:14 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:14 
INFO LoggingPodStatusWatcherImpl: Application reverent-mcclintock with 
submission ID spark:reverent-mcclintock-2cf0078f5caa813e-driver finished
   ```
   
   
   ### What you think should happen instead
   
   Expected behavior: a normal task will have the `taskinstance.py` part 
marking the task as success :
   ```
   [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - container state: 
terminated
   [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - container started 
at: 2024-05-09T09:24:04Z
   [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - container finished 
at: 2024-05-09T09:25:07Z
   [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - exit code: 0
   [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - termination reason: 
Completed
   [2024-05-09, 06:25:08 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:08 
INFO LoggingPodStatusWatcherImpl: Application elegant-rubin with submission ID 
spark:elegant-rubin-91cab58f5caaaab2-driver finished
   [2024-05-09, 06:25:09 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:09 
INFO ShutdownHookManager: Shutdown hook called
   [2024-05-09, 06:25:09 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:09 
INFO ShutdownHookManager: Deleting directory 
/tmp/spark-175bae5e-77d3-45b1-aa3c-cc88bdd8a462
   [2024-05-09, 06:25:09 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:09 
INFO ShutdownHookManager: Deleting directory 
/tmp/spark-70c6860c-1925-4858-be1f-23e9c6d5723a
   [2024-05-09, 06:25:09 -03] {spark_submit.py:492} INFO - 24/05/09 09:25:09 
INFO ShutdownHookManager: Deleting directory 
/tmp/spark-236d1121-2075-4b6e-bb54-2e3e66decd6a
   [2024-05-09, 06:25:09 -03] {taskinstance.py:1345} INFO - Marking task as 
SUCCESS. dag_id=cluster_ira_ciss_daily_3irmaos, 
task_id=load.load_competitor_store_3irmaos, execution_date=20240508T070000, 
start_date=20240509T092347, end_date=20240509T092509
   [2024-05-09, 06:25:09 -03] {local_task_job_runner.py:225} INFO - Task exited 
with return code 0
   [2024-05-09, 06:25:09 -03] {taskinstance.py:2653} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   
   ```
   
   ### How to reproduce
   
   It's totally random, happens in different dags with different tasks
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to