[jira] [Commented] (AIRFLOW-1143) Tasks rejected by workers get stuck in QUEUED

Dan Davydov (JIRA) Thu, 27 Apr 2017 11:12:00 -0700

    [ 
https://issues.apache.org/jira/browse/AIRFLOW-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987188#comment-15987188
 ]


Dan Davydov commented on AIRFLOW-1143:
--------------------------------------

I agree the NONE solution is OK for now (need to be careful though e.g. if the 
"task is already in running state" dep fails we don't want to set the state to 
NONE) but long term I think we want something more explicit (fixing heartbeat 
so it supports this and having the scheduler set the state instead of the 
worker). The reason is all of the state changes should eventually be done by 
the scheduler (workers should be "dumb").

I don't see that message "FIXME: Rescheduling due to concurrency limits" but 
that looks like some new TODO and we are running off of a subset of master at 
the moment (between 1.8.0 and 1.8.1 with some stuff after 1.8.1) so I'm 
guessing that's why.

> Tasks rejected by workers get stuck in QUEUED
> ---------------------------------------------
>
>                 Key: AIRFLOW-1143
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1143
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Dan Davydov
>            Assignee: Gerard Toonstra
>
> If the scheduler schedules a task that is sent to a worker that then rejects 
> the task (e.g. because one of the dependencies of the tasks became bad, like 
> the pool became full), the task will be stuck in the QUEUED state. We hit 
> this trying to switch from invoking the scheduler "airflow scheduler -n 5" to 
> just "airflow scheduler".
> Restarting the scheduler fixes this because it cleans up orphans, but we 
> shouldn't have to restart the scheduler to fix these problems (the missing 
> job heartbeats should make the scheduler requeue the task).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (AIRFLOW-1143) Tasks rejected by workers get stuck in QUEUED

Reply via email to