Boris -

We have a pull request in which causes the scheduler to not backfill on a
per dag basis. This is designed for exactly this situation. Basically, the
scheduler will skip intervals and jump to the last one in the list if this
flag is set. If this is important to you, please vote for it.

https://github.com/apache/incubator-airflow/pull/1830

For instance:
dag = DAG(
    "test_dag_id_here",
    "backfill": False
, ...
)



Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F&si=5141814536306688&pi=64fe41ec-e85d-4ccd-89a6-fa51d2a93409>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage&si=5141814536306688&pi=64fe41ec-e85d-4ccd-89a6-fa51d2a93409>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee&si=5141814536306688&pi=64fe41ec-e85d-4ccd-89a6-fa51d2a93409>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F&si=5141814536306688&pi=64fe41ec-e85d-4ccd-89a6-fa51d2a93409>

On Thu, Oct 13, 2016 at 8:07 AM, Joseph Napolitano <
joseph.napolit...@blueapron.com.invalid> wrote:

> Hi Boris,
>
> To answer the first question, the backfill command has a flag to mark jobs
> as successful without running them.  Take care to align the start and end
> times precisely as needed.  As an example, for a job that runs daily at
> 7am:
>
> airflow backfill -s 2016-10-07T07 -e 2016-10-10T07 my-dag-name -m
>
> The "-m" parameter tells Airflow to mark it successful without running it.
>
> On Thu, Oct 13, 2016 at 10:46 AM, Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
> > Hello all and thanks for such an amazing project! I have been evaluating
> > Airflow and spent a few days reading about it and playing with it and I
> > have a few questions that I struggle to understand.
> >
> > Let's say I have a simple DAG that runs once a day and it is doing a full
> > reload of tables from the source database so the process is not
> > incremental.
> >
> > Let's consider this scenario:
> >
> > Day 1 - OK
> >
> > Day 2 - airflow scheduler or server with airflow is down for some reason
> > ((or
> > DAG is paused)
> >
> > Day 3 - still down(or DAG is paused)
> >
> > Day 4 - server is up and now needs to run missing jobs.
> >
> >
> > How can I make airflow to run only Day 4 job and not backfill Day 2 and
> 3?
> >
> >
> > I tried to do depend_on_past = True but it does not seem to do this
> trick.
> >
> >
> > I also found in a roadmap doc this but seems it is not made to the
> release
> > yet:
> >
> >
> >  Only Run Latest - Champion : Sid
> >
> > • For cases where we need to only run the latest in a series of task
> > instance runs and mark the others as skipped. For example, we may have
> job
> > to execute a DB snapshot every day. If the DAG is paused for 5 days and
> > then unpaused, we don’t want to run all 5, just the latest. With this
> > feature, we will provide “cron” functionality for task scheduling that is
> > not related to ETL
> >
> >
> > My second question, what if I have another DAG that does incremental
> loads
> > from a source table:
> >
> >
> > Day 1 - OK, loaded new/changed data for previous day
> >
> > Day 2 - source system is down (or DAG is paused), Airflow DagRun failed
> >
> > Day 3 - source system is down (or DAG is paused), Airflow DagRun failed
> >
> > Day 4 - source system is up, Airflow Dagrun succeeded
> >
> >
> > My problem (unless I am missing something), Airflow on Day 4 would use
> > execution time from Day 3, so the interval for incremental load would be
> > since the last run (which was Failed). My hope it would use the last
> > _successful_ run so on Day 4 it would go back to Day 1. Is it possible to
> > achieve this?
> >
> > I am aware of a manual backfill command via CLI but I am not sure I want
> to
> > use due to all the issues and inconsistencies I've read about it.
> >
> > Thanks!
> >
>
>
>
> --
> *Joe Napolitano *| Sr. Data Engineer
> www.blueapron.com | 5 Crosby Street, New York, NY 10013
>

Reply via email to