baolsen opened a new issue #11011: URL: https://github.com/apache/airflow/issues/11011
**Apache Airflow version**: 1.10.8 **Environment**: - **Cloud provider or hardware configuration**: 4 VCPU 8GB RAM VM - **OS** (e.g. from /etc/os-release): RHEL 7.7 - **Kernel** (e.g. `uname -a`): Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux - **Install tools**: - **Others**: Note that the AWS DataSync Operator is not available in this version, we manually added it via Plugins. **What happened**: AWS DataSync service had a problem resulting in the Task Execution being stuck in LAUNCHING for a long period of time. DataSync Operator encounted a timeout exception (not an Airflow Timeout Exception but one from token expiry of the underlying boto3 service). This exception caused the operator to terminate but the Task Execution on AWS was still stuck in LAUNCHING Other Airflow Datasync Operator tasks started to pile up in QUEUED status and eventually timed out, also leaving their Task Executions in QUEUED state in AWS, blocked by the LAUNCHING task execution. **What you expected to happen**: The DataSync operator should by default cancel a task execution which is in progress - if the operator terminates for any reason. The AWS DataSync service can only run 1 DataSync task at a time (even when a task uses multiple DataSync agents). So there is a risk to all other DataSync tasks if one task gets stuck, then any tasks submitted in future will not run. So the operator should catch exceptions from the wait_for_task_execution and cancel the task before re-raising the exception. **How to reproduce it**: Very difficult to reproduce without an AWS account and DataSync appliance, and the uncommon error conditions which cause a task to get irrecoverably stuck. **Anything else we need to know**: I authored the DataSync operator and have a working AWS Account to test in. This issue can be assigned to me. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
