kristopherkane opened a new issue, #33667:
URL: https://github.com/apache/airflow/issues/33667

   ### Description
   
   Google Cloud Dataproc cluster creation should eagerly delete ERROR state 
clusters.
   
   It is possible for Google Cloud Dataproc clusters to create in the ERROR 
state.  The current operator (DataprocCreateClusterOperator) will require three 
total task attempts (original + two retries) in order to create the cluster, 
assuming underlying GCE infrastructure resolves itself between task attempts.  
This can be reduced to two total attempts by eagerly deleting a cluster in 
ERROR state before failing the current task attempt. 
   
   Clusters in the ERROR state are not useable to submit Dataproc based jobs 
via the Dataproc API. 
   
   
   
   ### Use case/motivation
   
   Reducing the number of task attempts can reduce GCP based cost as delays 
between retry attempts could be minutes.  There's no reason to keep a running, 
costly cluster in the ERROR state if it can be detected in the initial create 
task. 
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to