Hello Harish, Based on our understanding of Python Multiprocessing, a task instance gets a record in underlying database after there is an explicit call to airflow from that library (using Local Executor). So, I might be wrong, but you won't find a record in database until and unless that task instance has got initiated. I might be wrong in our assumptions and would love to be corrected if that's the case.
We have been using latest only operator and it's seems to be working well for skipping tasks if they are not current (basically avoiding backfill by marking all tasks below the latest only operator as skipped). It's present in master branch as of now and I would recommend you to look at that operator for backfill. Thanks! Vikas On Dec 4, 2016 5:23 AM, "harish singh" <[email protected]> wrote: Hi all, We have been running Airflow in our production for over 8-9 months now. I know there is a separate thread in place for Airflow 2.0. But I was not sure if any of the prior version has this fixed. If not, I will add this to the other email thread for 2.0. When I run airflow backfill with "-m" (Mark jobs as succeeded without running them) , is there a way to optimize this call? For example: airflow backfill TEST_DAG -s 2016-11-01T00:00:00 -e 2016-12-01T00:00:00 -m Here, I am running backfill for a month (from 1st Nov to 1st Dec). Essentially, Marking the jobs as succeeded without running them. It has ben more than an hour and the backfill has managed to reach only upto 2nd Nov. This seems to be very slow when there is no need to even run the tasks. I am running Airflow 1.7.0: These are my related configuration settings: parallelism = 50 dag_concurrency = 20 max_active_runs_per_dag = 8 Also, I have around 9 Dags running (all Hourly). The other 8 dags are running as scheduled with start_date of 2016-11-01T00:00:00 My question is, since I am only Marking the jobs as "succeeded" without running them, can this be done over 1 sql query, instead of per hour, per task basis? May be find out all the TaskInstances that needs to be mark succeeded and then just run a sql? I may not be aware of lot of things here and very possible I am assuming a lot of things, incorrectly. Please feel free to correct me. Thanks, Harish
