belgacea created AIRFLOW-3335:
---------------------------------
Summary: Bulk backfill & faster mark_success
Key: AIRFLOW-3335
URL: https://issues.apache.org/jira/browse/AIRFLOW-3335
Project: Apache Airflow
Issue Type: Improvement
Components: backfill
Reporter: belgacea
I'm using Airflow to schedule Spark jobs and I wanted to be able to `backfill`
a large time range (to catch up dags that are far beyond their schedules). I
used the `backfill` command with the `mark_success` argument and I was thinking
that all dagrun will be marked as succeed in a second, but airflow seems to
mark dags one by one (with some parallelization, using the
`parallelism`/`dag_concurrency` configuration). Each dag take approximately 2
seconds to be marked as succeed and this makes the backfill process really slow
for a large time range (or for small `time intervals`).
Is there a way to speed up the `mark_success` bakfilling ? And also is there a
way to tell to Airflow scheduler to backfill dags with a single instance per
task using the specified backfill time range (`start_date` + `end_date`) and
then mark as succeed all dagruns within the time range ?
Note : The dag I tried to backfill doesn't `depends_on_past`.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)