SameerMesiah97 opened a new issue, #60903:
URL: https://github.com/apache/airflow/issues/60903

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==9.20.0
   
   ### Apache Airflow version
   
   main
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   When using `EC2CreateInstanceOperator`, an EC2 instance may be successfully 
created even when the task execution role has partial EC2 permissions, for 
example lacking `ec2:DescribeInstances`.
   
   In this scenario, the operator successfully calls RunInstances and creates 
the EC2 instance. However, subsequent calls (such as describing or waiting for 
the instance when `wait_for_completion=True`) fail due to insufficient 
permissions. The task then fails, but the EC2 instance continues to exist and 
remains running in AWS, resulting in leaked infrastructure.
   
   ### What you think should happen instead
   
   If the operator fails after successfully creating an EC2 instance (for 
example due to missing `DescribeInstances` or other follow-up permissions), it 
should make a best-effort attempt to clean up the partially created resource by 
terminating the instance.
   
   Cleanup should be attempted opportunistically (i.e. only if the instance ID 
is known and the necessary permissions are available), and failure to clean up 
should not mask or replace the original exception.
   
   ### How to reproduce
   
   1. Create an IAM role that allows `ec2:RunInstances` but denies 
`ec2:DescribeInstances`.
   2. Configure an AWS connection in Airflow using this role.
   3. Use the following DAG:
   ```
   from datetime import datetime
   
   from airflow import DAG
   from airflow.providers.amazon.aws.operators.ec2 import 
EC2CreateInstanceOperator
   
   
   with DAG(
       dag_id="ec2_partial_auth_leak_repro",
       start_date=datetime(2025, 1, 1),
       schedule=None,
       catchup=False,
   ) as dag:
       create_instance = EC2CreateInstanceOperator(
           task_id="create_instance",
           aws_conn_id="aws_test_conn",
           image_id="ami-xxxxxxxxxxxxxxxxx",
           min_count=1,
           max_count=1,
           config={
               "SubnetId": "subnet-xxxxxxxxxxxxxxxxx",  # public subnet
               "SecurityGroupIds": ["sg-xxxxxxxxxxxxxxxxx"],
               "InstanceType": "t3.micro",
           },
           wait_for_completion=True,  # triggers DescribeInstances via waiter
       )
   ```
   4. Trigger the DAG.
   
   **Expected Result**
   The task fails due to missing `DescribeInstances` permissions but the EC2 
instance remains running in AWS and is not terminated automatically.
   
   ### Anything else
   
   This behavior can be surprising and potentially costly, as infrastructure is 
created even though the Airflow task fails. Other Airflow operators that manage 
external resources typically attempt best-effort cleanup on failure to avoid 
leaking infrastructure.
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to