Hello Airflow community, I have a basic question about how best to model a common data pipeline pattern here at Dropbox.
At Dropbox, all of our logs are ingested and written into Hive in hourly and/or daily rollups. On top of this data we build many weekly and monthly rollups, which typically run on a daily cadence and compute results over a rolling window. If we have a metric X, it seems natural to put the daily, weekly, and monthly rollups for metric X all in the same DAG. However, the different rollups have different dependency structures. The daily job only depends on a single day partition, whereas the weekly job depends on 7, the monthly on 28. In Airflow, it seems the two paradigms for modeling dependencies are: 1) Depend on a *single run of a task* within the same DAG 2) Depend on *multiple runs of task* by using an ExternalTaskSensor I'm not sure how I could possibly model this scenario using approach #1, and I'm not sure approach #2 is the most elegant or performant way to model this scenario. Any thoughts or suggestions?
