Ferdinanddb opened a new issue, #57416:
URL: https://github.com/apache/airflow/issues/57416

   ### Apache Airflow version
   
   3.1.1
   
   ### If "Other Airflow 2/3 version" selected, which one?
   
   3.X.X
   
   ### What happened?
   
   Sometimes, a SparkApplication is marked as completed (when looking at its 
state with `kubetctl`) but the job on Airflow side failed. Then the retry 
mechanism kicks in, and since the job is marked as completed then a mechanism 
is triggered to delete the job I believe, but it fails with the following error:
   
   ```
   ERROR - Task failed with exception
   AttributeError: 'SparkKubernetesOperator' object has no attribute 'launcher'
   ```
   
   For now, I need to manually delete the SparkApplication marked as 
`completed`, then clear the Airflow job to retry it.
   
   ### What you think should happen instead?
   
   When a SparkApplication is marked as `completed` but the Airflow 
TaskInstance failed, and the Airflow job has the arguments 
`reattach_on_restart=True` and `delete_on_termination=True`, then the next 
retry should find the SparkApplication (this is the case already) and delete it 
if its state is `completed`, or listen to it if its state is `running` (this is 
not the case).
   
   ### How to reproduce
   
   I believe that the state that I am describing can be reached when a 
SparkApplication is still running in the background, while the Airflow task 
that triggered it failed for a reason not linked to the SparkApplication.
   
   ### Operating System
   
   docker.io/apache/airflow:3.1.1-python3.12
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow[amazon, async, celery, cncf-kubernetes, google, http, 
postgres, slack, standard, fab, sftp, common-compat, openlineage]==3.1.1
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to