SameerMesiah97 opened a new pull request, #61010: URL: https://github.com/apache/airflow/pull/61010
**Description** Added best-effort cleanup to `EmrCreateJobFlowOperator` to terminate EMR clusters when failures occur after successful cluster creation. In certain failure modes, the operator could previously create a cluster via `create_job_flow` and then fail during later execution steps (for example, while waiting for completion when `DescribeCluster` permissions are missing). In these cases, the task failed while leaving the cluster running. The operator now attempts to terminate the created job flow if an exception is raised after creation. Cleanup is best-effort and does not override or mask the original exception. This change applies the same failure-handling approach recently introduced for EC2 infrastructure operators in PR #60904. **Rationale** `EmrCreateJobFlowOperator` is responsible for provisioning and coordinating an external, stateful service whose lifecycle extends beyond task execution. If the task fails after cluster creation, Airflow can no longer reliably manage or observe the cluster’s state. Adding opportunistic cleanup in these scenarios reduces the risk of orphaned EMR clusters and unexpected infrastructure costs, while preserving existing failure semantics. Cleanup errors are logged and do not affect the task’s final failure state. **Tests** * Added a unit test covering failure after cluster creation and verifying that termination is attempted. * Added a unit test ensuring cleanup failures do not mask the original exception. **Backwards Compatibility** No changes to the public API or operator parameters. **Reproduciblity** The failure scenario could not be reproduced directly due to personal AWS account permissions. However, based on the current control flow of `EmrCreateJobFlowOperator`, it is possible for cluster creation to succeed while a later step fails, leaving the EMR cluster running without cleanup. This change defensively addresses that case. Contributors reading this PR are free to provide a reproduction of the aforementioned failure mode if they can. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
