jialerchew opened a new issue, #31013:
URL: https://github.com/apache/airflow/issues/31013

   ### Description
   
   **The current state:** Example, if DAG is defined with 3 Datasets. Once 3 
datasets are updated at least once, Airflow will schedule the DAG. This is 
pretty good for most use cases.
   
   However, in the event if one of the Dataset Producers stopped working and 
missed/failed some of its supposed runs, the downstream DAG shouldn't be 
executed anymore, and the most recent Dataset should be made invalid.
   
   An quick way to implement this would be a concept of "freshness", where the 
Dataset will only be valid for a period of time.
   
   Using the example below, if `example_dataset_2` is already 24 hours old, and 
there is a "freshness" threshold of **12 hours**, I dont want 
`example_dataset_2` to be counted as “updated” anymore. Hence, when 
`example_dataset_3` updates, the DAG will still not be triggered, because 
`example_dataset_2` has already passed the 12-hour-freshness threshold.
   
   
![image](https://user-images.githubusercontent.com/8098670/235651695-6bbfa7e3-2030-449b-82bd-b3ef58e5dbdc.png)
   
   
   ### Use case/motivation
   
   My team is trying to move towards "reactive" DAGs, where we don't want to 
schedule downstreams DAGs and use sensors. This is because we are trying to 
reduce redundant DAG executions, and it's easier for the team to manually 
retrigger failed DAG runs. (Just trigger upstream, and it will automatically 
run downstream DAGs)
   
   Datasets is the perfect use case for us; however we are not completely 
comfortable to switch towards Datasets because it doesn't protect us from 
outdated Datasets. Regular scheduling + sensors combo do not face this issue 
because that method always refers to the exact task_id defined from 
`execution_delta`.
   
   We don't need such precision, just a way to measure "freshness" of Datasets 
would be good enough.
   
   Inspired from this 
[thread](https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1682791720916639) 
on Airflow Slack.
   
   ### Related issues
   
   Found this [other issue](https://github.com/apache/airflow/issues/30974), 
which is the complete opposite of this feature request.
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to