If you use depends_on_past=True, it won't proceed to the next DAG Run if the previous DAG Run failed. If Day 2 fails, Day 3 won't run.
-s On Thu, Oct 13, 2016 at 10:34 AM, siddharth anand <[email protected]> wrote: > Yes! It does work with Depends_on_past=True. > -s > > On Thu, Oct 13, 2016 at 10:28 AM, Boris Tyukin <[email protected]> > wrote: > >> thanks so much, Sid! just a follow up question on "Only_Run_Latest" - will >> it work with depend_on_past = True? or it will assume that DAG is used >> False? >> >> On Thu, Oct 13, 2016 at 1:11 PM, siddharth anand <[email protected]> >> wrote: >> >> > Boris, >> > >> > *Question 1* >> > Only_Run_Latest is in master - >> > https://github.com/apache/incubator-airflow/commit/ >> > edf033be65b575f44aa221d5d0ec9ecb6b32c67a. >> > That will solve your problem. >> > >> > Releases come out one a quarter sometimes once every 2 quarters, so I >> would >> > recommend that you run off master or off your own fork. >> > >> > You could also achieve this yourself with the following code snippet. It >> > uses a ShortCircuitOperator that will skip downstream tasks if the >> DagRun >> > being executed is not the current one. It will work for any schedule. >> The >> > code below has essentially been implemented in the LatestOnlyOperator >> above >> > for convenience. >> > >> > def skip_to_current_job(ds, **kwargs): >> > >> > now = datetime.now() >> > >> > left_window = kwargs['dag'].following_schedule(kwargs['execution_ >> > date']) >> > >> > right_window = kwargs['dag'].following_schedule(left_window) >> > >> > logging.info(('Left Window {}, Now {}, Right Window {}' >> > ).format(left_window,now,right_window)) >> > >> > if not now <= right_window: >> > >> > logging.info('Not latest execution, skipping downstream.') >> > >> > return False >> > >> > return True >> > >> > >> > t0 = ShortCircuitOperator( >> > >> > task_id = 'short_circuit_if_not_current, >> > >> > provide_context = True, >> > >> > python_callable = skip_to_current_job, >> > >> > dag = dag >> > >> > ) >> > >> > >> > -s >> > >> > >> > On Thu, Oct 13, 2016 at 7:46 AM, Boris Tyukin <[email protected]> >> > wrote: >> > >> > > Hello all and thanks for such an amazing project! I have been >> evaluating >> > > Airflow and spent a few days reading about it and playing with it and >> I >> > > have a few questions that I struggle to understand. >> > > >> > > Let's say I have a simple DAG that runs once a day and it is doing a >> full >> > > reload of tables from the source database so the process is not >> > > incremental. >> > > >> > > Let's consider this scenario: >> > > >> > > Day 1 - OK >> > > >> > > Day 2 - airflow scheduler or server with airflow is down for some >> reason >> > > ((or >> > > DAG is paused) >> > > >> > > Day 3 - still down(or DAG is paused) >> > > >> > > Day 4 - server is up and now needs to run missing jobs. >> > > >> > > >> > > How can I make airflow to run only Day 4 job and not backfill Day 2 >> and >> > 3? >> > > >> > > >> > > I tried to do depend_on_past = True but it does not seem to do this >> > trick. >> > > >> > > >> > > I also found in a roadmap doc this but seems it is not made to the >> > release >> > > yet: >> > > >> > > >> > > Only Run Latest - Champion : Sid >> > > >> > > • For cases where we need to only run the latest in a series of task >> > > instance runs and mark the others as skipped. For example, we may have >> > job >> > > to execute a DB snapshot every day. If the DAG is paused for 5 days >> and >> > > then unpaused, we don’t want to run all 5, just the latest. With this >> > > feature, we will provide “cron” functionality for task scheduling >> that is >> > > not related to ETL >> > > >> > > >> > > My second question, what if I have another DAG that does incremental >> > loads >> > > from a source table: >> > > >> > > >> > > Day 1 - OK, loaded new/changed data for previous day >> > > >> > > Day 2 - source system is down (or DAG is paused), Airflow DagRun >> failed >> > > >> > > Day 3 - source system is down (or DAG is paused), Airflow DagRun >> failed >> > > >> > > Day 4 - source system is up, Airflow Dagrun succeeded >> > > >> > > >> > > My problem (unless I am missing something), Airflow on Day 4 would use >> > > execution time from Day 3, so the interval for incremental load would >> be >> > > since the last run (which was Failed). My hope it would use the last >> > > _successful_ run so on Day 4 it would go back to Day 1. Is it >> possible to >> > > achieve this? >> > > >> > > I am aware of a manual backfill command via CLI but I am not sure I >> want >> > to >> > > use due to all the issues and inconsistencies I've read about it. >> > > >> > > Thanks! >> > > >> > >> > >
