Hi all,

We have been running Airflow in our production for over 8-9 months now.
I know there is a separate thread in place for Airflow 2.0.
But I was not sure if any of the prior version has this fixed.  If not, I
will add this to the other email thread for 2.0.

When I run airflow backfill with "-m"  (Mark jobs as succeeded without
running them) ,
is there a way to optimize this call?

For example:
airflow backfill TEST_DAG -s 2016-11-01T00:00:00 -e 2016-12-01T00:00:00 -m

Here, I am running backfill for a month (from 1st Nov to 1st Dec).
Essentially, Marking the jobs as succeeded without running them.

It has ben more than an hour and the backfill has managed to reach only
upto 2nd Nov.
This seems to be very slow when there is no need to even run the tasks.


I am running Airflow 1.7.0:
These are my related configuration settings:

parallelism = 50
dag_concurrency = 20
max_active_runs_per_dag = 8

Also, I have around 9 Dags running (all Hourly). The other 8 dags are
running as scheduled with start_date of 2016-11-01T00:00:00

My question is, since I am only Marking the jobs as "succeeded"
without running them,
can this be done over 1 sql query, instead of per hour, per task basis?
May be find out all the TaskInstances that needs to be mark succeeded
and then just run a sql?

I may not be aware of lot of things here and very possible I am
assuming a lot of things, incorrectly.
Please feel free to correct me.


Thanks,
Harish

Reply via email to