[ 
https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551139#comment-16551139
 ] 

Chris Bandy commented on AIRFLOW-593:
-------------------------------------

I can see that every instance of `load_transactions` believes it is the first 
instance:

{noformat}
[2018-07-20 18:23:53,724] {models.py:1216} DEBUG - <TaskInstance: 
x.load_transactions 2014-10-29 06:00:00 [scheduled]> dependency 'Previous 
Dagrun State' PASSED: True, This task instance was the first task instance for 
its task.

[2018-07-20 18:23:53,774] {models.py:1216} DEBUG - <TaskInstance: 
x.load_transactions 2014-10-30 06:00:00 [scheduled]> dependency 'Previous 
Dagrun State' PASSED: True, This task instance was the first task instance for 
its task.

[2018-07-20 18:24:33,619] {models.py:1216} DEBUG - <TaskInstance: 
x.load_transactions 2014-11-01 06:00:00 [scheduled]> dependency 'Previous 
Dagrun State' PASSED: True, This task instance was the first task instance for 
its task.

[2018-07-20 18:24:33,689] {models.py:1216} DEBUG - <TaskInstance: 
x.load_transactions 2014-10-31 06:00:00 [scheduled]> dependency 'Previous 
Dagrun State' PASSED: True, This task instance was the first task instance for 
its task.

[2018-07-20 18:24:59,968] {models.py:1216} DEBUG - <TaskInstance: 
x.load_transactions 2014-11-02 06:00:00 [scheduled]> dependency 'Previous 
Dagrun State' PASSED: True, This task instance was the first task instance for 
its task.
{noformat}


> Tasks do not get backfilled sequentially
> ----------------------------------------
>
>                 Key: AIRFLOW-593
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-593
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun, scheduler
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Jong Kim
>            Priority: Minor
>         Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png
>
>
> I need to have the tasks within a DAG complete in order when running 
> backfills. I am running on my mac locally using SequentialExecutor.
> Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a 
> start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, 
> which must complete in order. task0 -> task1 -> task2. This dependency is set 
> using .set_downstream().
> Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off 
> toggle in the webserver, and issue "airflow scheduler", which will 
> automatically backfill starting from start_date.
> It will backfill for 2016/10/20 and 2016/10/21.  I expect backfill to run 
> like the following sequentially:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': False, I see Airflow running tasks grouped by 
> sequence number something like this, which is not what I want:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to 
> run like what I need to, but instead it runs some tasks out of order like 
> this:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task0   <- out of order!
> datetime(2016, 10, 20, 11, 0, 0) task2   <- out of order!
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> Is this a bug? If not, am I understanding 'depends_on_past' and 
> 'wait_for_downstream' correctly? What do I need to do?
> The only remedy I can think of is to backfill each date manually.
> Public gist of DAG: 
> https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to