[
https://issues.apache.org/jira/browse/AIRFLOW-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653846#comment-16653846
]
ASF GitHub Bot commented on AIRFLOW-3149:
-----------------------------------------
dossett opened a new pull request #4064: AIRFLOW-3149 Support dataproc cluster
deletion on ERROR
URL: https://github.com/apache/incubator-airflow/pull/4064
Sometimes a dataproc cluster creation results in a
cluster in a state of ERROR, which makes it unsuable.
Subsequent Airflow retries will fail because a cluster
already exists. This change adds the option to delete
an ERROR cluster on creation so that subsequent attempts
might succeed. There are also some other small cleanups.
Make sure you have checked _all_ steps below.
### Jira
- [X] My PR addresses the following [Airflow
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3149/) issues and
references them in the PR title.
### Description
- [X] See commit message above
### Tests
- [ ] My PR adds the following unit tests __OR__ does not need testing for
this extremely good reason:
My change does not include tests, I did not see any integration tests in the
code base that this could fit into.
### Commits
- [X] My commits all reference Jira issues in their subject lines, and I
have squashed multiple commits if they address the same issue. In addition, my
commits follow the guidelines from "[How to write a good git commit
message](http://chris.beams.io/posts/git-commit/)":
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"
### Documentation
- [X] In case of new functionality, my PR adds documentation that describes
how to use it.
- When adding new operators/hooks/sensors, the autoclass documentation
generation needs to be added.
### Code Quality
- [ ] Passes `flake8`
I am not sure how to test this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> GCP dataproc cluster creation should have the option to delete an ERROR
> cluster
> -------------------------------------------------------------------------------
>
> Key: AIRFLOW-3149
> URL: https://issues.apache.org/jira/browse/AIRFLOW-3149
> Project: Apache Airflow
> Issue Type: Improvement
> Components: gcp
> Affects Versions: 1.10.0
> Reporter: Aaron Dossett
> Assignee: Aaron Dossett
> Priority: Minor
>
> We sometimes encounter issues where a dataproc cluster creation ends up in
> ERROR state. That is, the cluster “exists” but in the state of ERROR[1] (not
> just that the cluster creation API call failed). This makes retries
> impossible since the cluster name already exists subsequent retried creations
> are guaranteed to fail.
> A `delete_cluster_on_error` parameter should be added to the
> `DataprocClusterCreateOperator` operator that controls whether or not an
> attempt to delete an ERROR cluster is made.
>
> [1] - I’ve seen that happen in two ways 1) a purely transient error from GCP
> `Internal server error` or the like 2) when the request is rejected because
> it would exceed the project quota.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)