[
https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550894#comment-16550894
]
Chris Bandy commented on AIRFLOW-593:
-------------------------------------
https://lists.apache.org/thread.html/ef9ab995d019590eb7b072a74efca2a160b9a4916b6c1618c2ab762b@%3Cdev.airflow.apache.org%3E
> Tasks do not get backfilled sequentially
> ----------------------------------------
>
> Key: AIRFLOW-593
> URL: https://issues.apache.org/jira/browse/AIRFLOW-593
> Project: Apache Airflow
> Issue Type: Bug
> Components: DagRun, scheduler
> Affects Versions: Airflow 1.7.1.3
> Reporter: Jong Kim
> Priority: Minor
> Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png
>
>
> I need to have the tasks within a DAG complete in order when running
> backfills. I am running on my mac locally using SequentialExecutor.
> Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a
> start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks,
> which must complete in order. task0 -> task1 -> task2. This dependency is set
> using .set_downstream().
> Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off
> toggle in the webserver, and issue "airflow scheduler", which will
> automatically backfill starting from start_date.
> It will backfill for 2016/10/20 and 2016/10/21. I expect backfill to run
> like the following sequentially:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': False, I see Airflow running tasks grouped by
> sequence number something like this, which is not what I want:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to
> run like what I need to, but instead it runs some tasks out of order like
> this:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task0 <- out of order!
> datetime(2016, 10, 20, 11, 0, 0) task2 <- out of order!
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> Is this a bug? If not, am I understanding 'depends_on_past' and
> 'wait_for_downstream' correctly? What do I need to do?
> The only remedy I can think of is to backfill each date manually.
> Public gist of DAG:
> https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)