jens-scheffler-bosch commented on issue #30974:
URL: https://github.com/apache/airflow/issues/30974#issuecomment-1529115113
Have you thought about other options as alternatives?
e.g.:
- Add further attributes to Dataset definition to mark which Datasets have a
blocking dependency `schedule=[Dataset("http://my/important/dataset"),
Dataset("http://my/optional/dataset", optional=True)]`
- Add a timeout to the Dataset so that scheduler takes care for monitoring
if something non-important is not always required via
`schedule=[Dataset("http://my/important/dataset"),
Dataset("http://my/other/dataset", max_delay=60)]`
- Maybe more generic, provide an interface where I can plugin-in/hook my
custom logic in the Scheduling logs/Dataset evaluation so that "whatever logic"
can be made. e.g.
`schedule=custom_inbox_handler([Dataset("http://my/important/dataset"),
Dataset("http://my/other/dataset")])`
But maybe it depends n the use case when multiple data sources as data set
should be scheduled together. I can imagine only a few but probably I am not
aware of all :-)
Anyway I would rate it "cooler" if some dependency and logic rather is being
taken care of Scheduling logic and does not need special logic in the DAG via a
sensor.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]