Thanks Andy and Taylor for the suggestions -- I see how that would work for the case where you want a weekly rollup that runs on a weekly cadence.
But what about a rolling weekly or monthly rollup that runs each day? On Wed, Aug 8, 2018 at 11:00 AM, Andy Cooper <andy.coo...@astronomer.io> wrote: > To expand on Taylor's idea > > I recently wrote a ScheduleBlackoutSensor that would allow you to prevent a > task from running if it meets the criteria provided. It accepts an array of > args for any number of the criteria so you could leverage this sensor to > provide "blackout" runs for a range of days of the week. > > https://github.com/apache/incubator-airflow/pull/3702/files > > For example, > > task = ScheduleBlackoutSensor(day_of_week=[0,1,2,3,4,5], dag=dag) > > Would prevent a task from running Monday - Saturday, allowing it to run on > Sunday. > > You could leverage this Sensor as you would any other sensor or you could > invert the logic so that you would only need to specify > > task = ScheduleBlackoutSensor(day_of_week=6, dag=dag) > > To "whitelist" a task to run on Sundays. > > > Let me know if you have any questions > > On Wed, Aug 8, 2018 at 1:47 PM Taylor Edmiston <tedmis...@gmail.com> > wrote: > > > Gabriel - > > > > One approach I've seen for a similar use case is to have multiple related > > rollups in one DAG that runs daily, then have the non-daily tasks skip > most > > of the time (e.g., weekly only actually executes on Sundays and is > > parameterized to look at the last 7 days). > > > > You could implement that not running part a few ways, but one idea is a > > sensor in front of the weekly rollup task. Imagine a SundaySensor like > > return > > execution_date.weekday() == 6. One thing to keep in mind here is > > dependence on the DAG's cron schedule being more granular than the tasks. > > > > I think this could generalize into a DayOfWeekSensor / DayOfMonthSensor > > that would be nice to have. > > > > Of course this does mean some scheduler inefficiency on the skip days, > but > > as long as those skips are fast and the overall number of tasks is > small, I > > can accept that. > > > > *Taylor Edmiston* > > Blog <https://blog.tedmiston.com/> | CV > > <https://stackoverflow.com/cv/taylor> | LinkedIn > > <https://www.linkedin.com/in/tedmiston/> | AngelList > > <https://angel.co/taylor> | Stack Overflow > > <https://stackoverflow.com/users/149428/taylor-edmiston> > > > > > > On Wed, Aug 8, 2018 at 1:11 PM, Gabriel Silk <gs...@dropbox.com.invalid> > > wrote: > > > > > Hello Airflow community, > > > > > > I have a basic question about how best to model a common data pipeline > > > pattern here at Dropbox. > > > > > > At Dropbox, all of our logs are ingested and written into Hive in > hourly > > > and/or daily rollups. On top of this data we build many weekly and > > monthly > > > rollups, which typically run on a daily cadence and compute results > over > > a > > > rolling window. > > > > > > If we have a metric X, it seems natural to put the daily, weekly, and > > > monthly rollups for metric X all in the same DAG. > > > > > > However, the different rollups have different dependency structures. > The > > > daily job only depends on a single day partition, whereas the weekly > job > > > depends on 7, the monthly on 28. > > > > > > In Airflow, it seems the two paradigms for modeling dependencies are: > > > 1) Depend on a *single run of a task* within the same DAG > > > 2) Depend on *multiple runs of task* by using an ExternalTaskSensor > > > > > > I'm not sure how I could possibly model this scenario using approach > #1, > > > and I'm not sure approach #2 is the most elegant or performant way to > > model > > > this scenario. > > > > > > Any thoughts or suggestions? > > > > > >