AchimGaedkeLynker opened a new pull request, #36828: URL: https://github.com/apache/airflow/pull/36828
This PR implements the termination of instances in creation when a task instance is stopped (externally marked for retry) before the instance IDs were returned. It uses the `on_kill` method to shut down EC2 instances. Use case/problem: The `EC2CreateInstanceOperator` is often used in a setup task of https://airflow.apache.org/docs/apache-airflow/stable/howto/setup-and-teardown.html . If the task inside the DAG is cancelled, a (potentially) ongoing `EC2CreateInstanceOperator` task is killed. This is likely if the AWS initialization and/or the post-processing step take considerable time (large instance storage, long cloud-init processes) and the `wait_for_completion` is `True`. Without the `on_kill` cleanup code the partially initialized instances (i.e. the instance ids were not sent to XCom) will not be terminated by the tear-down task. This happened several times to me and I finally found the cause and fixed it. Alternative: Split the setup into many small tasks: `EC2CreateInstance` as a setup (wo `wait_for_completion`), wait on instance state, wait on cloud-init setup to finish, ... `EC2TerminateInstance` as teardown. Questions: * Is it required to delete the `_on_kill_instance_ids` - what's coming after execute in the TI lifecycle? * How to unit-test this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
