GitHub user FelipeRamos-neuro edited a discussion: Expand dataset (asset) scheduling to tasks as well, changing the dag run to a deferred state while it awaits for events triggered externally
In the company that I work, we encountered some issues when trying to execute a dag in a manner that could truly support asynchronous execution based on events generated by external services. We devised a strategy using deferrable operators, a table to store state changes between tasks, a sensor to check for state changes and a callback API endpoint for the external services to change the state of the task at event generation. I haven't experimented as much with data-aware scheduling, but as I understand it, right now, it is focused on scheduling and triggering of dag_runs. The idea hinges on introducing some sort of deferred state for dag_runs, where we could, maybe, use the Airflow REST API endpoint for queued events to signal Airflow, from external services, that the dag run can continue execution. I'm not fully aware of how many complexities this could add to the scheduling process, but I think that after the release of Airflow 3.0, this could be a good feature to focus on. GitHub link: https://github.com/apache/airflow/discussions/44816 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
