If you use depends_on_past=True, it won't proceed to the next DAG Run if
the previous DAG Run failed. If Day 2 fails, Day 3 won't run.

-s

On Thu, Oct 13, 2016 at 10:34 AM, siddharth anand <san...@apache.org> wrote:

> Yes! It does work with Depends_on_past=True.
> -s
>
> On Thu, Oct 13, 2016 at 10:28 AM, Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> thanks so much, Sid! just a follow up question on "Only_Run_Latest" - will
>> it work with depend_on_past = True? or it will assume that DAG is used
>> False?
>>
>> On Thu, Oct 13, 2016 at 1:11 PM, siddharth anand <san...@apache.org>
>> wrote:
>>
>> > Boris,
>> >
>> > *Question 1*
>> > Only_Run_Latest is in master -
>> > https://github.com/apache/incubator-airflow/commit/
>> > edf033be65b575f44aa221d5d0ec9ecb6b32c67a.
>> > That will solve your problem.
>> >
>> > Releases come out one a quarter sometimes once every 2 quarters, so I
>> would
>> > recommend that you run off master or off your own fork.
>> >
>> > You could also achieve this yourself with the following code snippet. It
>> > uses a ShortCircuitOperator that will skip downstream tasks if the
>> DagRun
>> > being executed is not the current one. It will work for any schedule.
>> The
>> > code below has essentially been implemented in the LatestOnlyOperator
>> above
>> > for convenience.
>> >
>> > def skip_to_current_job(ds, **kwargs):
>> >
>> >     now = datetime.now()
>> >
>> >     left_window = kwargs['dag'].following_schedule(kwargs['execution_
>> > date'])
>> >
>> >     right_window = kwargs['dag'].following_schedule(left_window)
>> >
>> >     logging.info(('Left Window {}, Now {}, Right Window {}'
>> > ).format(left_window,now,right_window))
>> >
>> >     if not now <= right_window:
>> >
>> >         logging.info('Not latest execution, skipping downstream.')
>> >
>> >         return False
>> >
>> >     return True
>> >
>> >
>> > t0 = ShortCircuitOperator(
>> >
>> >   task_id         = 'short_circuit_if_not_current,
>> >
>> >   provide_context = True,
>> >
>> >   python_callable = skip_to_current_job,
>> >
>> >   dag             = dag
>> >
>> > )
>> >
>> >
>> > -s
>> >
>> >
>> > On Thu, Oct 13, 2016 at 7:46 AM, Boris Tyukin <bo...@boristyukin.com>
>> > wrote:
>> >
>> > > Hello all and thanks for such an amazing project! I have been
>> evaluating
>> > > Airflow and spent a few days reading about it and playing with it and
>> I
>> > > have a few questions that I struggle to understand.
>> > >
>> > > Let's say I have a simple DAG that runs once a day and it is doing a
>> full
>> > > reload of tables from the source database so the process is not
>> > > incremental.
>> > >
>> > > Let's consider this scenario:
>> > >
>> > > Day 1 - OK
>> > >
>> > > Day 2 - airflow scheduler or server with airflow is down for some
>> reason
>> > > ((or
>> > > DAG is paused)
>> > >
>> > > Day 3 - still down(or DAG is paused)
>> > >
>> > > Day 4 - server is up and now needs to run missing jobs.
>> > >
>> > >
>> > > How can I make airflow to run only Day 4 job and not backfill Day 2
>> and
>> > 3?
>> > >
>> > >
>> > > I tried to do depend_on_past = True but it does not seem to do this
>> > trick.
>> > >
>> > >
>> > > I also found in a roadmap doc this but seems it is not made to the
>> > release
>> > > yet:
>> > >
>> > >
>> > >  Only Run Latest - Champion : Sid
>> > >
>> > > • For cases where we need to only run the latest in a series of task
>> > > instance runs and mark the others as skipped. For example, we may have
>> > job
>> > > to execute a DB snapshot every day. If the DAG is paused for 5 days
>> and
>> > > then unpaused, we don’t want to run all 5, just the latest. With this
>> > > feature, we will provide “cron” functionality for task scheduling
>> that is
>> > > not related to ETL
>> > >
>> > >
>> > > My second question, what if I have another DAG that does incremental
>> > loads
>> > > from a source table:
>> > >
>> > >
>> > > Day 1 - OK, loaded new/changed data for previous day
>> > >
>> > > Day 2 - source system is down (or DAG is paused), Airflow DagRun
>> failed
>> > >
>> > > Day 3 - source system is down (or DAG is paused), Airflow DagRun
>> failed
>> > >
>> > > Day 4 - source system is up, Airflow Dagrun succeeded
>> > >
>> > >
>> > > My problem (unless I am missing something), Airflow on Day 4 would use
>> > > execution time from Day 3, so the interval for incremental load would
>> be
>> > > since the last run (which was Failed). My hope it would use the last
>> > > _successful_ run so on Day 4 it would go back to Day 1. Is it
>> possible to
>> > > achieve this?
>> > >
>> > > I am aware of a manual backfill command via CLI but I am not sure I
>> want
>> > to
>> > > use due to all the issues and inconsistencies I've read about it.
>> > >
>> > > Thanks!
>> > >
>> >
>>
>
>

Reply via email to