dossett commented on issue #4064: AIRFLOW-3149 Support dataproc cluster 
deletion on ERROR
URL: 
https://github.com/apache/incubator-airflow/pull/4064#issuecomment-431002257
 
 
   Hi @fenglu-g thanks for your comment.  My goal wasn't just to make sure the 
ERROR cluster gets deleted but to give the cluster creation a chance to succeed 
with a retry. 
   
   The behavior we have observed is this:
   - Sometimes a cluster create fails and the cluster exists in an ERROR state
   - the cluster create operator retries based on our DAG configuration
   - the retries fail because a cluster with the same name already exists in 
the ERROR state
   - after the the retries are exhausted the DAG proceeds with that step as 
failed
   
   After applying this patch internally we observe:
   - Sometimes a cluster create fails and the cluster exists in an ERROR state
   - We immediately delete the cluster within the create operator
   - the cluster create operator retries based on our DAG configuration
   - the cluster creation succeeds because whatever led to the initial cluster 
creation ERROR was a transient problem
   
   It has greatly increased the reliability and stability of our GCP DAGS.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to