[
https://issues.apache.org/jira/browse/AIRFLOW-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16998489#comment-16998489
]
jack commented on AIRFLOW-5171:
-------------------------------
I can confirm this bug also exist in 1.10.3 + Local Executor.
Sadly for us it also seem to be random in nature.
I have yet been able to find common ground not to talk about example to
reproduce :(
> Random task gets stuck in queued state despite all dependencies met
> -------------------------------------------------------------------
>
> Key: AIRFLOW-5171
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5171
> Project: Apache Airflow
> Issue Type: Bug
> Components: executors, scheduler
> Affects Versions: 1.10.2
> Reporter: Matt C. Wilson
> Priority: Major
> Attachments: Airflow - Log.png, Airflow - Task Instance Details.htm
>
>
> We are experiencing an issue similar to that reported in AIRFLOW-1641 and
> AIRFLOW-4586. We run two parallel dags, both using a common set of pools,
> both using LocalExecutor.
> What we are seeing is once every couple dozen dag runs, a task will reach the
> `queued` status and not continue into a `running` state once a pool slot is
> open / dependencies are filled.
> Investigating the task instance details confirms the same; Airflow reports
> that it expects the task to commence shortly once resources are available.
> See attachment. [^Airflow - Task Instance Details.htm]
> While tasks are in this state, the sibling parallel dag is able to flow
> completely, even multiple times through. So we know the issue is not with
> pool constraints, executor issues, etc. The problem really seems to be that
> Airflow has simply lost track of the task and failed to start it.
> Clearing the task state has no effect - the task does not get moved back into
> a `scheduled` or `queued` or `running` state, it just stays at the `none`
> state. The task must be marked as `failed` or `success` to resume normal dag
> flow.
> This issue has been causing sporadic production degradation for us, with no
> obvious avenue for troubleshooting. It's not clear if changing the
> `dagbag_import_timeout` (as reported in 1641) will help because our task has
> no log showing in the Airflow UI. See screenshot. !Airflow - Log.png!
> I'm open to all recommendations to try to get to the bottom of this. Please
> let me know if there is any log data or other info I can provide.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)