Boris -

The pull request includes a airflow.cfg config entry to set backfill=False
by default.

[scheduler]
backfill_by_default=(*true*|false)


Thanks,
Ben

*--*
*ben tallman* | *apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F&si=5141814536306688&pi=c6e7205d-0093-4da9-abbd-f16be5463522>*
 | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage&si=5141814536306688&pi=c6e7205d-0093-4da9-abbd-f16be5463522>
 @apigee
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee&si=5141814536306688&pi=c6e7205d-0093-4da9-abbd-f16be5463522>
<http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F&si=5141814536306688&pi=c6e7205d-0093-4da9-abbd-f16be5463522>

On Fri, Oct 14, 2016 at 7:13 PM, Boris Tyukin <bo...@boristyukin.com> wrote:

> Done. It makes sense to me Ben as backfill concept is very confusing to
> me. I think it even should be off by default.
>
> On 2016-10-13 12:05 (-0400), Ben Tallman <b...@apigee.com> wrote:
> > Boris -
> >
> > We have a pull request in which causes the scheduler to not backfill on a
> > per dag basis. This is designed for exactly this situation. Basically,
> the
> > scheduler will skip intervals and jump to the last one in the list if
> this
> > flag is set. If this is important to you, please vote for it.
> >
> > https://github.com/apache/incubator-airflow/pull/1830
> >
> > For instance:
> > dag = DAG(
> >     "test_dag_id_here",
> >     "backfill": False
> > , ...
> > )
> >
> >
> >
> > Thanks,
> > Ben
> >
> > *--*
> > *ben tallman* | *apigee
> > <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nM
> JW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%
> 2F%2Fwww.apigee.com%2F&si=5141814536306688&pi=64fe41ec-
> e85d-4ccd-89a6-fa51d2a93409>*
> >  | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage
> > <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nM
> JW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%
> 2F%2Ftwitter.com%2Fanonymousmanage&si=5141814536306688&pi=64fe41ec-
> e85d-4ccd-89a6-fa51d2a93409>
> >  @apigee
> > <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nM
> JW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%
> 3A%2F%2Ftwitter.com%2Fapigee&si=5141814536306688&pi=
> 64fe41ec-e85d-4ccd-89a6-fa51d2a93409>
> > <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nM
> JW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%
> 2F%2Fadapt.apigee.com%2F&si=5141814536306688&pi=64fe41ec-
> e85d-4ccd-89a6-fa51d2a93409>
> >
> > On Thu, Oct 13, 2016 at 8:07 AM, Joseph Napolitano <
> > joseph.napolit...@blueapron.com.invalid> wrote:
> >
> > > Hi Boris,
> > >
> > > To answer the first question, the backfill command has a flag to mark
> jobs
> > > as successful without running them.  Take care to align the start and
> end
> > > times precisely as needed.  As an example, for a job that runs daily at
> > > 7am:
> > >
> > > airflow backfill -s 2016-10-07T07 -e 2016-10-10T07 my-dag-name -m
> > >
> > > The "-m" parameter tells Airflow to mark it successful without running
> it.
> > >
> > > On Thu, Oct 13, 2016 at 10:46 AM, Boris Tyukin <bo...@boristyukin.com>
> > > wrote:
> > >
> > > > Hello all and thanks for such an amazing project! I have been
> evaluating
> > > > Airflow and spent a few days reading about it and playing with it
> and I
> > > > have a few questions that I struggle to understand.
> > > >
> > > > Let's say I have a simple DAG that runs once a day and it is doing a
> full
> > > > reload of tables from the source database so the process is not
> > > > incremental.
> > > >
> > > > Let's consider this scenario:
> > > >
> > > > Day 1 - OK
> > > >
> > > > Day 2 - airflow scheduler or server with airflow is down for some
> reason
> > > > ((or
> > > > DAG is paused)
> > > >
> > > > Day 3 - still down(or DAG is paused)
> > > >
> > > > Day 4 - server is up and now needs to run missing jobs.
> > > >
> > > >
> > > > How can I make airflow to run only Day 4 job and not backfill Day 2
> and
> > > 3?
> > > >
> > > >
> > > > I tried to do depend_on_past = True but it does not seem to do this
> > > trick.
> > > >
> > > >
> > > > I also found in a roadmap doc this but seems it is not made to the
> > > release
> > > > yet:
> > > >
> > > >
> > > >  Only Run Latest - Champion : Sid
> > > >
> > > > • For cases where we need to only run the latest in a series of task
> > > > instance runs and mark the others as skipped. For example, we may
> have
> > > job
> > > > to execute a DB snapshot every day. If the DAG is paused for 5 days
> and
> > > > then unpaused, we don’t want to run all 5, just the latest. With this
> > > > feature, we will provide “cron” functionality for task scheduling
> that is
> > > > not related to ETL
> > > >
> > > >
> > > > My second question, what if I have another DAG that does incremental
> > > loads
> > > > from a source table:
> > > >
> > > >
> > > > Day 1 - OK, loaded new/changed data for previous day
> > > >
> > > > Day 2 - source system is down (or DAG is paused), Airflow DagRun
> failed
> > > >
> > > > Day 3 - source system is down (or DAG is paused), Airflow DagRun
> failed
> > > >
> > > > Day 4 - source system is up, Airflow Dagrun succeeded
> > > >
> > > >
> > > > My problem (unless I am missing something), Airflow on Day 4 would
> use
> > > > execution time from Day 3, so the interval for incremental load
> would be
> > > > since the last run (which was Failed). My hope it would use the last
> > > > _successful_ run so on Day 4 it would go back to Day 1. Is it
> possible to
> > > > achieve this?
> > > >
> > > > I am aware of a manual backfill command via CLI but I am not sure I
> want
> > > to
> > > > use due to all the issues and inconsistencies I've read about it.
> > > >
> > > > Thanks!
> > > >
> > >
> > >
> > >
> > > --
> > > *Joe Napolitano *| Sr. Data Engineer
> > > www.blueapron.com | 5 Crosby Street, New York, NY 10013
> > >
> >
>

Reply via email to