@gwax added the LatestOnlyOperator https://github.com/apache/incubator-airflow/pull/1752
This is a really nifty operator, so I wanted to let folks know about it. A lot of people run cron for a mix of workloads. Some jobs map to traditional ETL workloads (e.g. load hourly data summarization for 2016-10-01T00:00:00Z). Some are simple cron tasks -- run a database backup every night. In the latter case, if you miss 3 runs (e.g. your dag is paused or your start date is a few days/weeks/months/ago), you don't want to make up for lost time and backfill all of those days. Essentially, running N database backups at once will take your database down... We'd prefer traditional cron behavior in these cases, not ETL behavior. *Enter the LatestOnlyOperator.* Place this operator upstream of any tasks that you want to skip unless the Dagrun is the latest. You can place a trigger rule downstream to "end" its effect. By combining a Trigger Rule with this operator, you can ensure only portions of your dag honor this "latest only" requirement. Or simply, have an entire DAG run in "latest only" mode by using the LatestOnlyOperator alone, i.e. not pairing it with a TriggerRule downstream. This is a useful pattern that I have been coding around for some time by using a ShortCircuitOperator with a python callable, where the callable evaluates the "latest"-ness of the dag run. I suspect we have all been re-inventing this wheel, which is where Airflow's Operators shine. Thanks to @gwax for implementing this and sticking with a long and often delayed review/merge process. -s
