Chris7 opened a new issue #21225:
URL: https://github.com/apache/airflow/issues/21225


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   Tasks are stuck in the queued state and will not be scheduled for execution. 
In the logs, these tasks have a message of `could not queue task <task 
details>`, as they are currently in the `queued` or `running` sets in the 
executor.
   
   ### What you expected to happen
   
   Tasks run :)
   
   ### How to reproduce
   
   We have a dockerized airflow setup, using celery with a rabbit broker & 
postgres as the result db. When rolling out DAG updates, we redeploy most of 
the components (the workers, scheduler, webserver, and rabbit). We can have a 
few thousand Dagruns at a given time. This error seems to happen during a load 
spike when a deployment happens.
   
   Looking at the code, this is what I believe is happening:
   
   Starting from the initial debug message of `could not queue task` I found 
tasks are marked as running (but in the UI they still appear as queued): 
https://github.com/apache/airflow/blob/main/airflow/executors/base_executor.py#L85
   
   Tracking through our logs, I see these tasks are recovered by the adoption 
code, and the state there is STARTED 
(https://github.com/apache/airflow/blob/main/airflow/executors/celery_executor.py#L540).
 
   
   Following the state update code, I see this does not cause any state updates 
to occur in Airflow 
(https://github.com/apache/airflow/blob/main/airflow/executors/celery_executor.py#L465).
 Thus, if a task is marked as STARTED in the results db, but queued in the 
airflow task state, it will never be transferred out by the scheduler. However 
,you can get these tasks to finally run by clicking the `run` button. 
   
   ### Operating System
   
   Ubuntu 20.04.3 LTS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   It's during load peaks, I believe this is a race condition with task 
execution and recording of task state.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to