[ https://issues.apache.org/jira/browse/AIRFLOW-194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michal TOMA updated AIRFLOW-194: -------------------------------- Attachment: screenshot-2.png > Task hangs in up_for_retry state for very long > ---------------------------------------------- > > Key: AIRFLOW-194 > URL: https://issues.apache.org/jira/browse/AIRFLOW-194 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: Airflow 1.7.0 > Environment: Airflow 1.7.0 on RHEL 7 and OpenSuse 13.2 > Reporter: Michal TOMA > Assignee: Siddharth Anand > Attachments: screenshot-1.png, screenshot-2.png > > > I can observe this problem on 2 separate Airflow installations. > The symptoms are: > - One (and only one) task stays in up_for_retry state even when the last of > the retries finished with an OK stays. > - It is yellow in the tree view. > - The execution somehow resumes several hours later automatically > - It seems (not a certitude) related to a mode when the task execution is > "lagging" behind normal execution. > Here is an example of a task that should run every hour "0 * * * *": > Current date : 2016-05-30T15:31:00+0200 > ----- Run 1 ------ > Run ID: 2016-05-05T21:00:00 > Task start: 2015-05-30T07:38:XX.XXX > Task end: 2015-05-30T08:23:XX.XXX > Marked as success > ----- Run 2 ------ > Run ID: 2016-05-05T22:00:00 > Task start: 2015-05-30T11:10:XX.XXX > Task end: 2015-05-30T11:56:XX.XXX > Marked as success > ----- Run 3 ------ > Run ID: 2016-05-05T23:00:00 > Task start: 2015-05-30T11:56:XX.XXX > Task end: 2015-05-30T12:41:XX.XXX > Marked as success > ----- Run 4 ------ > Run ID: 2016-05-06T00:00:00 > Task start: 2015-05-30T15:12:XX.XXX > Task end: (Still running now) > Marked as running > There are nearly 2 hours between Run-1 and Run-2, and nearly 2 hours as well > between Run-3 and Run-4. > Only Run-3 starts immediately after the end of Run-2 what is the expected > behavior as the Runs are very late on schedule (Run ID is 2016-05-06 while we > are on 2016-05-30) > This is a high priority issue for our setup. I could try to dig more in depth > into this problem but I have no idea where to look to debug this issue. > Any pointers would be more than welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)