Hello Harish,

Based on our understanding of Python Multiprocessing, a task instance gets
a record in underlying database after there is an explicit call to airflow
from that library (using Local Executor). So, I might be wrong, but you
won't find a record in database until and unless that task instance has got
initiated. I might be wrong in our assumptions and would love to be
corrected if that's the case.

We have been using latest only operator and it's seems to be working well
for skipping tasks if they are not current (basically avoiding backfill by
marking all tasks below the latest only operator as skipped). It's present
in master branch as of now and I would recommend you to look at that
operator for backfill.

Thanks!
Vikas

On Dec 4, 2016 5:23 AM, "harish singh" <[email protected]> wrote:

Hi all,

We have been running Airflow in our production for over 8-9 months now.
I know there is a separate thread in place for Airflow 2.0.
But I was not sure if any of the prior version has this fixed.  If not, I
will add this to the other email thread for 2.0.

When I run airflow backfill with "-m"  (Mark jobs as succeeded without
running them) ,
is there a way to optimize this call?

For example:
airflow backfill TEST_DAG -s 2016-11-01T00:00:00 -e 2016-12-01T00:00:00 -m

Here, I am running backfill for a month (from 1st Nov to 1st Dec).
Essentially, Marking the jobs as succeeded without running them.

It has ben more than an hour and the backfill has managed to reach only
upto 2nd Nov.
This seems to be very slow when there is no need to even run the tasks.


I am running Airflow 1.7.0:
These are my related configuration settings:

parallelism = 50
dag_concurrency = 20
max_active_runs_per_dag = 8

Also, I have around 9 Dags running (all Hourly). The other 8 dags are
running as scheduled with start_date of 2016-11-01T00:00:00

My question is, since I am only Marking the jobs as "succeeded"
without running them,
can this be done over 1 sql query, instead of per hour, per task basis?
May be find out all the TaskInstances that needs to be mark succeeded
and then just run a sql?

I may not be aware of lot of things here and very possible I am
assuming a lot of things, incorrectly.
Please feel free to correct me.


Thanks,
Harish

Reply via email to