Re: [PR] EksCreateNodegroupOperator could leave nodegroups running after failure [airflow]

via GitHub Wed, 28 Jan 2026 06:23:17 -0800


SameerMesiah97 commented on code in PR #61145:
URL: https://github.com/apache/airflow/pull/61145#discussion_r2736881208



##########
providers/amazon/src/airflow/providers/amazon/aws/operators/eks.py:
##########
@@ -104,17 +104,43 @@ def _create_compute(
             nodeRole=nodegroup_role_arn,
             **create_nodegroup_kwargs,
         )
-        if wait_for_completion:
-            log.info("Waiting for nodegroup to provision.  This will take some 
time.")
-            wait(
-                waiter=eks_hook.conn.get_waiter("nodegroup_active"),
-                waiter_delay=waiter_delay,
-                waiter_max_attempts=waiter_max_attempts,
-                args={"clusterName": cluster_name, "nodegroupName": 
nodegroup_name},
-                failure_message="Nodegroup creation failed",
-                status_message="Nodegroup status is",
-                status_args=["nodegroup.status"],
+        try:
+            if wait_for_completion:
+                log.info("Waiting for nodegroup to provision.  This will take 
some time.")
+                wait(
+                    waiter=eks_hook.conn.get_waiter("nodegroup_active"),
+                    waiter_delay=waiter_delay,
+                    waiter_max_attempts=waiter_max_attempts,
+                    args={"clusterName": cluster_name, "nodegroupName": 
nodegroup_name},
+                    failure_message="Nodegroup creation failed",
+                    status_message="Nodegroup status is",
+                    status_args=["nodegroup.status"],
+                )
+        except Exception:

Review Comment:
   > I agree. We should:
   >
   > * Filter the exceptions
   > * Have a flag to configure whether we want that auto deletion in case of 
failure
   
   Exactly what I was considering when I saw the comment by @o-nikolas .
   
   For the exceptions, do you want to limit it to WaiterError for now? And for 
the flag, what are your thoughts on opt-in (no cleanup by default) or opt-out 
(cleanup by default). I believe it should be opt-out because a lot of users 
might not initially be aware of this flag and then learn of it later after 
experiencing a bad resource leak. And if the user prefers observability over 
hard prevention of resource leaks, it should be an intentional decision. In 
either case, the user should be notified of the orphaned node group via 
logging. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] EksCreateNodegroupOperator could leave nodegroups running after failure [airflow]

Reply via email to