eladkal commented on issue #30974:
URL: https://github.com/apache/airflow/issues/30974#issuecomment-2088039805

   > Should these rules be defined as Dataset args?
   
   It's a property of a specific DAG that uses the dataset. It's not a property 
of the dataset itself.
   
   > A common pattern that I have seen amongst Astronomer customers is for data 
producers to define datasets in a [consolidated 
file](https://github.com/astronomer/snowpatrol/blob/main/include/datasets.py) 
in order to make them discoverable for data consumers. Data consumers will then 
import them to use for scheduling purposes.
   
   I agree this is a common pattern.
   
   > (Side note: I think we will need to clean this up a lot as a part of 
Airflow 3, likely completely redesign the schedule API including how both 
timetables and datasets are passed in.)
   
   Possibly but right now we don't know if we are having Airflow 3 and when...
   
   
   > Adding that flag on Dataset itself feels wrong to me since it’d force 
everything that depends on a dataset to have the same timeout.
   
   I agree. What I actually meant is:
   
   ```
   schedule=[
           Dataset("s3://dataset/example1.csv"),
           Dataset("s3://dataset/example2.csv"),
           wait_no_longer_than(Dataset("s3://dataset/example3.csv", 
timedelta(hours=1)),
       ]
   ```
   
   In my point of view the `wait_no_longer_than` is not a property of the 
dataset but an option of how to treat a dataset when DAG takes dependency on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to