Happy 2024 everyone!

I’m going to kick off the new year by formally proposing a new AIP. This 
attempts to standardise the URI format used by Dataset events. This is driven a 
lot by the lack of adoption of Datasets. It turns out (maybe not surprisingly 
when I think about it) simply triggering events from a literal string name 
isn’t particularly useful (usable) in practical contexts, and some “smarter” 
features are generally sought after in most cases.

Two most popular examples are listening on a directory for file additions, or 
making operators emit Dataset events automatically like OpenLineage events. 
Both are technically doable, but rather impractical without abusing the literal 
string Dataset identifier. By introducing a standard semantic, those can be 
more easily implemented on the scheduler (listening) side instead.

Please find the document on Confluence:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-60+Standard+URI+representation+for+Airflow+Datasets

Both comments on the specification and/or implementation, or proposals to add 
to the URI formats are welcomed. Note that we don’t need to add all the formats 
in the AIP; this only attempts to establish a process to do so, so we can add 
new ones to the documentation in the future.

TP
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to