I put up a draft AIP for scheduler-managed backfill here:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-78+Scheduler-managed+backfill

Quick summary:

TLDR: move backfill from CLI process to the scheduler

Backfill currently is a CLI-only feature that in effect runs a scheduler
locally in the CLI process.  We don't have good visibility of backfill jobs
in the web UI, and users without CLI access cannot access the feature.
Additionally, it's not ideal to have a "second scheduler" from a project
maintenance perspective.

This AIP focuses specifically on moving management of backfill jobs to the
scheduler.  This will take something away from users.  Previously you could
run backfill in local mode which would not only schedule the backfill
locally but run all the tasks locally as well.  This will go away.  And the
scheduler will of course have more to do, to the extent that backfill is
used.  The scheduler will become somewhat more complex since it will have
to manage backfill runs too.

There are some interactions with other AIPs.  E.g. backfill is
fundamentally about data completeness.  And the data awareness AIPs may
change what that can mean in Airflow.

I look forward to your feedback.

Thanks

Reply via email to