baolsen opened a new issue #11011:
URL: https://github.com/apache/airflow/issues/11011


   **Apache Airflow version**: 1.10.8
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: 4 VCPU 8GB RAM VM
   - **OS** (e.g. from /etc/os-release): RHEL 7.7
   - **Kernel** (e.g. `uname -a`): Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 
20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
   - **Install tools**:
   - **Others**:
   Note that the AWS DataSync Operator is not available in this version, we 
manually added it via Plugins.
   
   **What happened**:
   
   AWS DataSync service had a problem resulting in the Task Execution being 
stuck in LAUNCHING for a long period of time. 
   DataSync Operator encounted a timeout exception (not an Airflow Timeout 
Exception but one from token expiry of the underlying boto3 service).
   This exception caused the operator to terminate but the Task Execution on 
AWS was still stuck in LAUNCHING
   
   Other Airflow Datasync Operator tasks started to pile up in QUEUED status 
and eventually timed out, also leaving their Task Executions in QUEUED state in 
AWS, blocked by the LAUNCHING task execution.
   
   **What you expected to happen**:
   
   The DataSync operator should by default cancel a task execution which is in 
progress - if the operator terminates for any reason.
   
   The AWS DataSync service can only run 1 DataSync task at a time (even when a 
task uses multiple DataSync agents). So there is a risk to all other DataSync 
tasks if one task gets stuck, then any tasks submitted in future will not run.
   
   So the operator should catch exceptions from the wait_for_task_execution and 
cancel the task before re-raising the exception.
   
   **How to reproduce it**:
   
   Very difficult to reproduce without an AWS account and DataSync appliance, 
and the uncommon error conditions which cause a task to get irrecoverably stuck.
   
   **Anything else we need to know**:
   
   I authored the DataSync operator and have a working AWS Account to test in. 
This issue can be assigned to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to