I think I mentioned this on another similar thread and I think our use case
might be somewhat similar. We have a daily ETL that loads data in to a
database in one DAG, and then need to do a weekly rollup report every
Tuesday that is in another DAG. The first DAG has a final
TriggerDagRunOperator that decides if today is Tuesday or not, and if yes,
triggers the weekly rollup DAG that operates on the data in the database at
that moment - which, since it's Tuesday, is all the data we want. If that
sounds like what you're trying to do, your first DAG might have a
TriggerDagRunOperator that decides if today is the first of the month, and
then triggers some other DAG.

Laura

On Thu, Jun 30, 2016 at 12:09 PM, Jeremiah Lowin <[email protected]> wrote:

> Interesting -- this could be an extension of open enhancement AIRFLOW-100
> https://issues.apache.org/jira/browse/AIRFLOW-100. Let me see if I can
> restate this correctly:
>
> - You have a daily ETL job
> - You have a monthly reporting job, for arguments sake lets say it runs on
> the last day of each month with an execution date equal to the last day of
> the prior month (for example on 7/31/2016 the task with execution date
> 6/30/2016 will run).
> You want the monthly job with execution date 6/30/2016 to wait for (and
> include) the daily ETLs through 7/31/2016. In some months, that requires a
> 31 day delta, in others 30 (in others 28... and forget about leap years).
>
> It sounds like the simplest solution (and the one proposed in A-100) is to
> allow ExternalTaskSensor to accept not just a static delta, but potentially
> a callable that accepts the current execution date and returns the desired
> execution date for the sensed task. In this case, it would take in
> 6/30/2016 and return 7/31/2016 as the last day of the following month. I
> don't think any headway has been made on actually implementing the solution
> but it should be straightforward -- I will try to get to it if I have some
> time in the next few days.
>
>
> On Wed, Jun 29, 2016 at 11:25 AM Adrian Bridgett <[email protected]>
> wrote:
>
> > I'm hitting a bit of an annoying problem and wondering about the best
> > course of action.
> >
> > We have several dags:
> > - a daily ETL job
> > - several reporting jobs (daily, weekly or monthly) which use the data
> > from previous ETL jobs
> >
> > I wish to have a dependency such that the reporting jobs depend upon the
> > last ETL job that the report uses.   We're happy to set depends_on_past
> > in the ETL job.
> >
> > Daily jobs are easy - ExternalTaskSensor, job done.
> > Weekly jobs are a little trickier - we need to work out the
> > execution_delta - normally +6 for us (we deliberately run a day late to
> > prioritise other jobs).
> > Monthly jobs.... this is where I'm struggling - how to work out the
> > execution_delta.   I guess the ideal would be an upgrade from timedelta
> > to dateutil.relativedelta?   tomorrow_ds and ds_add don't help either.
> >
> > I must admit, ds being the time that's just gone has caused me no end of
> > brain befudledness, especially when trying to get the initial job right
> > (so much so that I wrote this up in our DAG README, posting here for
> > others):
> >
> > When adding a new job, it's critical to ensure that you've set the
> > schedule correctly:
> > - frequency (monthly, weekly, daily)
> > - schedule_interval ("0 0 2 * *", "0 0 * * 0", "0 0 * * *")
> > - start_date (choose a day that matches schedule_interval at least one
> > interval ago)
> > -- e.g if today is Thursday 2016-06-09, go back in time to when the
> > schedule will trigger,
> >     then work out what "ds" (execution date) would be (remembering
> > that's the lapsed date)
> > --- for a monthly job, last trigger=2016-06-02, ds=2016-05-02
> > --- for a weekly job, last trigger=2016-06-05, ds=2016-05-29
> > --- for a daily job, last trigger=2016-06-09, ds=2016-06-08
> >
>

Reply via email to