AchimGaedkeLynker opened a new pull request, #36828:
URL: https://github.com/apache/airflow/pull/36828

   This PR implements the termination of instances in creation when a task 
instance is stopped (externally marked for retry) before the instance IDs were 
returned. It uses the `on_kill` method to shut down EC2 instances.
   
   Use case/problem:
   
   The `EC2CreateInstanceOperator` is often used in a setup task of 
https://airflow.apache.org/docs/apache-airflow/stable/howto/setup-and-teardown.html
 . If the task inside the DAG is cancelled, a (potentially) ongoing 
`EC2CreateInstanceOperator` task is killed. This is likely if the AWS 
initialization and/or the post-processing step take considerable time (large 
instance storage, long cloud-init processes) and the `wait_for_completion` is 
`True`.
   
   Without the `on_kill` cleanup code the partially initialized instances (i.e. 
the instance ids were not sent to XCom) will not be terminated by the tear-down 
task.
   
   This happened several times to me and I finally found the cause and fixed it.
   
   Alternative: Split the setup into many small tasks: `EC2CreateInstance` as a 
setup (wo `wait_for_completion`), wait on instance state, wait on cloud-init 
setup to finish,  ... `EC2TerminateInstance` as teardown.
   
   Questions:
   
   * Is it required to delete the `_on_kill_instance_ids` - what's coming after 
execute in the TI lifecycle?
   * How to unit-test this?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to