GitHub user FelipeRamos-neuro added a comment to the discussion: Expand dataset (asset) scheduling to tasks as well, changing the dag run to a deferred state while it awaits for events triggered externally
A use case would be something like this: You want to automate the execution of multiple applications with a dag, but all of them work using different resources and maybe even different infrastructures, one is a simple call to an API, the other is an ETL pipeline that runs on external infrastructure provided by a client (that by all accounts we don't necessarily know when is going to finish) and the last one is a data validation that we run on our own infrastructure. All sequentially and using previous outputs for the next task. In such a case, the dag would have a sensor to monitor the ETL pipeline, but since it sensors are traditionally pull-based, it would require the dag_run to be in a state of running and the sensor would also be running indefinitely on the triggerer component. If instead of a sensor, I could define an asset that could receive push-based events from the Airflow REST API, passing for example: dag_id, run_id and task_id as arguments, and set the dag_run to some sort of deferred state until it can resume execution from that push-based event. This would possibly reduce, in certain scenarios, resource usage as it frees worker slots and process inefficiencies such as relying on a pull-based mechanism for monitoring. GitHub link: https://github.com/apache/airflow/discussions/44816#discussioncomment-11523580 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
