Hi guys,
I am having an issue with making 'depends_on_past=true' work
This my pipeline:
a -> b -> c -> d -> e
a -> x -> e
a -> y -> e
I have default args for all Tasks:
scheduling_start_date = (datetime.utcnow() -
datetime.timedelta(hours=1)).replace(minute=0, second=0,
microsecond=0)
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': scheduling_start_date,
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 2,
'retry_delay': default_retries_delay,
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),}
But specifically for tasks d, x, y , I have depends_on_past = true
depends_on_past=True
So now:
For the first hour, d, x and y failed.
So I am assuming in the next hour these jobs should not be even tried?
right ?
But I see in the next hour and subsequent hours, these tasks are getting
triggered (and failing) ...
Should the behavior be : that if a tasks previous execution failed, no
attempt is made during the next run of dag?
Or am I doing something very "bad" here?
Thanks,
Harish