I'm hitting a bit of an annoying problem and wondering about the best course of action.

We have several dags:
- a daily ETL job
- several reporting jobs (daily, weekly or monthly) which use the data from previous ETL jobs

I wish to have a dependency such that the reporting jobs depend upon the last ETL job that the report uses. We're happy to set depends_on_past in the ETL job.

Daily jobs are easy - ExternalTaskSensor, job done.
Weekly jobs are a little trickier - we need to work out the execution_delta - normally +6 for us (we deliberately run a day late to prioritise other jobs). Monthly jobs.... this is where I'm struggling - how to work out the execution_delta. I guess the ideal would be an upgrade from timedelta to dateutil.relativedelta? tomorrow_ds and ds_add don't help either.

I must admit, ds being the time that's just gone has caused me no end of brain befudledness, especially when trying to get the initial job right (so much so that I wrote this up in our DAG README, posting here for others):

When adding a new job, it's critical to ensure that you've set the schedule correctly:
- frequency (monthly, weekly, daily)
- schedule_interval ("0 0 2 * *", "0 0 * * 0", "0 0 * * *")
- start_date (choose a day that matches schedule_interval at least one interval ago) -- e.g if today is Thursday 2016-06-09, go back in time to when the schedule will trigger, then work out what "ds" (execution date) would be (remembering that's the lapsed date)
--- for a monthly job, last trigger=2016-06-02, ds=2016-05-02
--- for a weekly job, last trigger=2016-06-05, ds=2016-05-29
--- for a daily job, last trigger=2016-06-09, ds=2016-06-08

Reply via email to