rafidka commented on issue #13824:
URL: https://github.com/apache/airflow/issues/13824#issuecomment-1055701303


   I did some dive deep on this issue and root caused it.
   
   **TL;DR** It is related to `watchtower` and has been fixed in version 2.0.0 
and later versions. Airflow 2.2.4 now uses that version (see #19907). So, if 
you are using it, you should be good. If not, then you need to force install 
`watchtower` version 2.0.0 or later. Notice, however, that `watchtower` made 
some changes to the `CloudWatchLogHandler` `__init__ ` method so you need to 
update the relevant code.
    
   ---
   
   The issue is related to a combination of three factors: forking + threading 
+ logging. This combination can lead to a deadlock when logs are being flushed 
after a task finishes execution. This means that the StandardTaskRunner will be 
stuck at [this 
line](https://github.com/apache/airflow/blob/10023fdd65fa78033e7125d3d8103b63c127056e/airflow/task/task_runner/standard_task_runner.py#L92).
 Now, since the task has actually finished (thus its state is success), but the 
process didn't yet exit (and will never since it is in a deadlock state), the 
`heartbeat_callback` will end up thinking that the task state was set 
externally and issue the following warning and then SIGTERM the task:
   
   ```
   State of this instance has been externally set to success. Terminating 
instance.
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to