Done. It makes sense to me Ben as backfill concept is very confusing to me. I think it even should be off by default.
On 2016-10-13 12:05 (-0400), Ben Tallman <[email protected]> wrote: > Boris - > > We have a pull request in which causes the scheduler to not backfill on a > per dag basis. This is designed for exactly this situation. Basically, the > scheduler will skip intervals and jump to the last one in the list if this > flag is set. If this is important to you, please vote for it. > > https://github.com/apache/incubator-airflow/pull/1830 > > For instance: > dag = DAG( > "test_dag_id_here", > "backfill": False > , ... > ) > > > > Thanks, > Ben > > *--* > *ben tallman* | *apigee > <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fwww.apigee.com%2F&si=5141814536306688&pi=64fe41ec-e85d-4ccd-89a6-fa51d2a93409>* > | m: +1.503.680.5709 | o: +1.503.608.7552 | twitter @anonymousmanage > <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Ftwitter.com%2Fanonymousmanage&si=5141814536306688&pi=64fe41ec-e85d-4ccd-89a6-fa51d2a93409> > @apigee > <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=https%3A%2F%2Ftwitter.com%2Fapigee&si=5141814536306688&pi=64fe41ec-e85d-4ccd-89a6-fa51d2a93409> > <http://t.sidekickopen06.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XZs4WJfgqW4WJj7n3MP7VWW3LqXLC56dWRRf2H8CkP02?t=http%3A%2F%2Fadapt.apigee.com%2F&si=5141814536306688&pi=64fe41ec-e85d-4ccd-89a6-fa51d2a93409> > > On Thu, Oct 13, 2016 at 8:07 AM, Joseph Napolitano < > [email protected]> wrote: > > > Hi Boris, > > > > To answer the first question, the backfill command has a flag to mark jobs > > as successful without running them. Take care to align the start and end > > times precisely as needed. As an example, for a job that runs daily at > > 7am: > > > > airflow backfill -s 2016-10-07T07 -e 2016-10-10T07 my-dag-name -m > > > > The "-m" parameter tells Airflow to mark it successful without running it. > > > > On Thu, Oct 13, 2016 at 10:46 AM, Boris Tyukin <[email protected]> > > wrote: > > > > > Hello all and thanks for such an amazing project! I have been evaluating > > > Airflow and spent a few days reading about it and playing with it and I > > > have a few questions that I struggle to understand. > > > > > > Let's say I have a simple DAG that runs once a day and it is doing a full > > > reload of tables from the source database so the process is not > > > incremental. > > > > > > Let's consider this scenario: > > > > > > Day 1 - OK > > > > > > Day 2 - airflow scheduler or server with airflow is down for some reason > > > ((or > > > DAG is paused) > > > > > > Day 3 - still down(or DAG is paused) > > > > > > Day 4 - server is up and now needs to run missing jobs. > > > > > > > > > How can I make airflow to run only Day 4 job and not backfill Day 2 and > > 3? > > > > > > > > > I tried to do depend_on_past = True but it does not seem to do this > > trick. > > > > > > > > > I also found in a roadmap doc this but seems it is not made to the > > release > > > yet: > > > > > > > > > Only Run Latest - Champion : Sid > > > > > > ⢠For cases where we need to only run the latest in a series of task > > > instance runs and mark the others as skipped. For example, we may have > > job > > > to execute a DB snapshot every day. If the DAG is paused for 5 days and > > > then unpaused, we donât want to run all 5, just the latest. With this > > > feature, we will provide âcronâ functionality for task scheduling > > > that is > > > not related to ETL > > > > > > > > > My second question, what if I have another DAG that does incremental > > loads > > > from a source table: > > > > > > > > > Day 1 - OK, loaded new/changed data for previous day > > > > > > Day 2 - source system is down (or DAG is paused), Airflow DagRun failed > > > > > > Day 3 - source system is down (or DAG is paused), Airflow DagRun failed > > > > > > Day 4 - source system is up, Airflow Dagrun succeeded > > > > > > > > > My problem (unless I am missing something), Airflow on Day 4 would use > > > execution time from Day 3, so the interval for incremental load would be > > > since the last run (which was Failed). My hope it would use the last > > > _successful_ run so on Day 4 it would go back to Day 1. Is it possible to > > > achieve this? > > > > > > I am aware of a manual backfill command via CLI but I am not sure I want > > to > > > use due to all the issues and inconsistencies I've read about it. > > > > > > Thanks! > > > > > > > > > > > -- > > *Joe Napolitano *| Sr. Data Engineer > > www.blueapron.com | 5 Crosby Street, New York, NY 10013 > > >
