dossett commented on issue #4064: AIRFLOW-3149 Support dataproc cluster deletion on ERROR URL: https://github.com/apache/incubator-airflow/pull/4064#issuecomment-431002257 Hi @fenglu-g thanks for your comment. My goal wasn't just to make sure the ERROR cluster gets deleted but to give the cluster creation a chance to succeed with a retry. The behavior we have observed is this: - Sometimes a cluster create fails and the cluster exists in an ERROR state - the cluster create operator retries based on our DAG configuration - the retries fail because a cluster with the same name already exists in the ERROR state - after the the retries are exhausted the DAG proceeds with that step as failed After applying this patch internally we observe: - Sometimes a cluster create fails and the cluster exists in an ERROR state - We immediately delete the cluster within the create operator - the cluster create operator retries based on our DAG configuration - the cluster creation succeeds because whatever led to the initial cluster creation ERROR was a transient problem It has greatly increased the reliability and stability of our GCP DAGS.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
