wfaria opened a new issue #9557:
URL: https://github.com/apache/airflow/issues/9557


   **Description**
   
   One day our production Airflow production environment started to raise the 
following exception for multiple tasks in different DAGs:
   
   ```
   Try 0 out of 1
   
   Exception:
   
   Executor reports task instance finished (failed) although the task says its 
queued. Was the task killed externally?
   
   ```
   If you let it retry the tasks, eventually it would end. However, it made 
lots of ETLs increase the execution time like 400%.
   
   Later, thanks for an obscure Stackoverflow answer, we discovered that this 
was happening because the Airflow database was overloaded. Probably the Airflow 
Scheduler started to receive multiple timeouts and thought that healthy tasks 
were killed or something like. We changed it to use a better instance and the 
exceptions stopped without any other change.
   
   **Use case / motivation**
   
   Firstly, the start of the error message doesn't seem to be natural `Try 0 
out of 1`. The task didn't even run, I believe that it should flag that,  
showing something like `Try 0 out of 1 (failed to start the task)`.
   
   The rest of the exception message is pretty useless. I am far from an 
Airflow expert but I believe that the message could be more specific and change 
accordingly to the exception context, some ideas which I had according to some 
problems which could happen:
   
   * Executor reports task instance finished (failed) although the task says 
its queued. Was the task killed externally?
   * Executor reports task instance finished (failed) although the task says 
its queued. Is the Scheduler task healthy?
   * Executor reports task instance finished (failed) although the task says 
its queued. Is the Airflow connection with the Database OK?
   
   That is it, if possible, small changes like that would avoid a lot of 
problems for other people like I had these days.
   
   
   Thanks!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to