Happy 2024 everyone! I’m going to kick off the new year by formally proposing a new AIP. This attempts to standardise the URI format used by Dataset events. This is driven a lot by the lack of adoption of Datasets. It turns out (maybe not surprisingly when I think about it) simply triggering events from a literal string name isn’t particularly useful (usable) in practical contexts, and some “smarter” features are generally sought after in most cases.
Two most popular examples are listening on a directory for file additions, or making operators emit Dataset events automatically like OpenLineage events. Both are technically doable, but rather impractical without abusing the literal string Dataset identifier. By introducing a standard semantic, those can be more easily implemented on the scheduler (listening) side instead. Please find the document on Confluence: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-60+Standard+URI+representation+for+Airflow+Datasets Both comments on the specification and/or implementation, or proposals to add to the URI formats are welcomed. Note that we don’t need to add all the formats in the AIP; this only attempts to establish a process to do so, so we can add new ones to the documentation in the future. TP --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org