Hi Boris,

To answer the first question, the backfill command has a flag to mark jobs
as successful without running them.  Take care to align the start and end
times precisely as needed.  As an example, for a job that runs daily at 7am:

airflow backfill -s 2016-10-07T07 -e 2016-10-10T07 my-dag-name -m

The "-m" parameter tells Airflow to mark it successful without running it.

On Thu, Oct 13, 2016 at 10:46 AM, Boris Tyukin <bo...@boristyukin.com>
wrote:

> Hello all and thanks for such an amazing project! I have been evaluating
> Airflow and spent a few days reading about it and playing with it and I
> have a few questions that I struggle to understand.
>
> Let's say I have a simple DAG that runs once a day and it is doing a full
> reload of tables from the source database so the process is not
> incremental.
>
> Let's consider this scenario:
>
> Day 1 - OK
>
> Day 2 - airflow scheduler or server with airflow is down for some reason
> ((or
> DAG is paused)
>
> Day 3 - still down(or DAG is paused)
>
> Day 4 - server is up and now needs to run missing jobs.
>
>
> How can I make airflow to run only Day 4 job and not backfill Day 2 and 3?
>
>
> I tried to do depend_on_past = True but it does not seem to do this trick.
>
>
> I also found in a roadmap doc this but seems it is not made to the release
> yet:
>
>
>  Only Run Latest - Champion : Sid
>
> • For cases where we need to only run the latest in a series of task
> instance runs and mark the others as skipped. For example, we may have job
> to execute a DB snapshot every day. If the DAG is paused for 5 days and
> then unpaused, we don’t want to run all 5, just the latest. With this
> feature, we will provide “cron” functionality for task scheduling that is
> not related to ETL
>
>
> My second question, what if I have another DAG that does incremental loads
> from a source table:
>
>
> Day 1 - OK, loaded new/changed data for previous day
>
> Day 2 - source system is down (or DAG is paused), Airflow DagRun failed
>
> Day 3 - source system is down (or DAG is paused), Airflow DagRun failed
>
> Day 4 - source system is up, Airflow Dagrun succeeded
>
>
> My problem (unless I am missing something), Airflow on Day 4 would use
> execution time from Day 3, so the interval for incremental load would be
> since the last run (which was Failed). My hope it would use the last
> _successful_ run so on Day 4 it would go back to Day 1. Is it possible to
> achieve this?
>
> I am aware of a manual backfill command via CLI but I am not sure I want to
> use due to all the issues and inconsistencies I've read about it.
>
> Thanks!
>



-- 
*Joe Napolitano *| Sr. Data Engineer
www.blueapron.com | 5 Crosby Street, New York, NY 10013

Reply via email to