shivannakarthik opened a new issue, #59812:
URL: https://github.com/apache/airflow/issues/59812
### Apache Airflow Provider(s)
google
### Versions of Apache Airflow Providers
_No response_
### Apache Airflow version
main
### Operating System
ubuntu
### Deployment
Astronomer
### Deployment details
_No response_
### What happened
When implementing the Ephemeral Dataproc Cluster pattern:
`Create Cluster` -> `Run Jobs` -> `Delete Cluster (TriggerRule.ALL_DONE)`
There is a conflict between the default behavior of
`DataprocCreateClusterOperator` and the downstream
`DataprocDeleteClusterOperator`.
1. `DataprocCreateClusterOperator` has `delete_on_error=True` by default. If
the cluster creation fails and ends up in an `ERROR` state, the operator
automatically deletes the cluster.
2. The downstream `DataprocDeleteClusterOperator` triggers (due to
`TriggerRule.ALL_DONE`).
3. It attempts to delete the cluster which no longer exists.
4. The `DataprocDeleteClusterOperator` fails with a `NotFound` (404) error
from the Google Cloud API.
This causes the cleanup task to be marked as `failed`, which creates noise
and can potentially mask the actual upstream failure in monitoring views.
### What you think should happen instead
`DataprocDeleteClusterOperator` should ideally be idempotent. If the cluster
is already deleted (returns 404 NotFound), the operator should consider the
task successful (or skipped) rather than failed.
Currently, the `deferrable` mode implementation checks for existence:
```python
try:
hook.get_cluster(...)
except NotFound:
self.log.info("Cluster deleted.")
return
```
However, the standard synchronous `execute` path does not seem to catch
`NotFound` exceptions during the delete operation.
### How to reproduce
1. Create a DAG with `DataprocCreateClusterOperator` ->
`DataprocDeleteClusterOperator` (with `trigger_rule=TriggerRule.ALL_DONE`).
2. Force the cluster creation to enter an ERROR state (e.g., by providing
invalid configuration that passes validation but fails provisioning).
3. `DataprocCreateClusterOperator` will delete the cluster and fail.
4. `DataprocDeleteClusterOperator` will run, attempt to delete the missing
cluster, and fail with `NotFound`.
### Anything else
Proposed behaviour:
1. Update `DataprocDeleteClusterOperator` to catch `NotFound` exceptions
during the delete operation and log a message instead of raising an error.
2. Alternatively, update documentation to explicitly recommend setting
`delete_on_error=False` in `DataprocCreateClusterOperator` when an explicit
delete task is used.
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]